Deciphering Foreign Language NLP Sujith Ravi and Kevin Knight - - PowerPoint PPT Presentation

deciphering foreign language
SMART_READER_LITE
LIVE PREVIEW

Deciphering Foreign Language NLP Sujith Ravi and Kevin Knight - - PowerPoint PPT Presentation

Deciphering Foreign Language NLP Sujith Ravi and Kevin Knight sravi@usc.edu,knight@isi.edu Information Sciences Institute University of Southern California Statistical Machine Translation (MT) Bilingual text Translation tables Current


slide-1
SLIDE 1

Deciphering Foreign Language

NLP

Sujith Ravi and Kevin Knight

sravi@usc.edu,
knight@isi.edu

Information Sciences Institute University of Southern California

slide-2
SLIDE 2

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

slide-3
SLIDE 3

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

Spanish/English

slide-4
SLIDE 4

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

Spanish/English Japanese/German

slide-5
SLIDE 5

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

Spanish/English Japanese/German Malayalam/English

slide-6
SLIDE 6

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

Spanish/English Japanese/German Malayalam/English Swahili/German

...

slide-7
SLIDE 7

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems

Spanish/English Japanese/German Malayalam/English Swahili/German

... BOTTLENECK

slide-8
SLIDE 8

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Current MT systems Can we get rid

  • f parallel data?

Spanish/English Japanese/German Malayalam/English Swahili/German

... BOTTLENECK

slide-9
SLIDE 9

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

Current MT systems Can we get rid

  • f parallel data?

Spanish/English Japanese/German Malayalam/English Swahili/German

... BOTTLENECK

slide-10
SLIDE 10

2

Statistical Machine Translation (MT)

(Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe : : : Bilingual text

TRAIN associates/ asociados : 0.8 groups/ grupos : 0.9 : :

Translation tables

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

Current MT systems

PLENTY PLENTY

Can we get rid

  • f parallel data?

Spanish/English Japanese/German Malayalam/English Swahili/German

... BOTTLENECK

slide-11
SLIDE 11

3

Getting Rid of Parallel Data

Machine Translation without parallel data

  • MT system trained on non-parallel data

➡ useful for rare language-pairs (limited/no parallel data)

slide-12
SLIDE 12

3

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

Getting Rid of Parallel Data

Machine Translation without parallel data

  • MT system trained on non-parallel data

➡ useful for rare language-pairs (limited/no parallel data)

slide-13
SLIDE 13

3

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

Getting Rid of Parallel Data

Machine Translation without parallel data

  • MT system trained on non-parallel data

➡ useful for rare language-pairs (limited/no parallel data)

  • Goal: not to beat existing MT systems, instead
  • Can we build a reasonably good MT system from scratch

without any parallel data?

➡ monolingual resources available in plenty

slide-14
SLIDE 14

Related Work

  • Extracting bilingual lexical connections from comparable

corpora

➡ exploit word context frequencies (Fung, 1995; Rapp,

1995; Koehn & Knight, 2001)

➡ Canonical Correlation Analysis (CCA) method (Haghighi

& Klein, 2008)

  • Mining parallel sentence pairs for MT training using

comparable corpora (Munteanu et al., 2004)

➡ need dictionary, some initial parallel data

4

Machine Translation without parallel data

slide-15
SLIDE 15

5

Our Contributions

NLP

✓ MT system built from scratch without parallel data

➡ novel decipherment approach for translation ➡ novel methods for training translation models from

non-parallel text

➡ Bayesian training for IBM 3 translation model

✓ Novel methods to deal with large-scale vocabularies

inherent in MT problems

✓ Empirical studies for MT decipherment

New

slide-16
SLIDE 16

6

Rest of this Talk

  • Introduction
  • Related Work
  • New Idea for Language Translation

➡ Step 1: Word Substitution ➡ Step 2: Foreign Language as a Cipher

  • Conclusion
slide-17
SLIDE 17

7

“When I look at an article in Spanish, I say to myself, this is really English, but it has been encoded in some strange symbols. Now I will proceed to decode...” Warren Weaver

  • Ciphertext: este es un sistema de cifrado complejo

(Spanish)

Cracking the MT Code

New

(1947)

slide-18
SLIDE 18

7

“When I look at an article in Spanish, I say to myself, this is really English, but it has been encoded in some strange symbols. Now I will proceed to decode...” Warren Weaver

  • Ciphertext: este es un sistema de cifrado complejo
  • Plaintext: this is a complex cipher

(English) (Spanish)

Cracking the MT Code

New

(1947)

slide-19
SLIDE 19

f

Spanish corpus

El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha

  • rdenado la documentación por diferentes

categorías atendiendo a los hechos más

  • notables. Desde el tipo de suceso (evento

criminal, fuego amigo, ...

MT Decipherment without Parallel Data

New

slide-20
SLIDE 20

f

Spanish corpus

El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha

  • rdenado la documentación por diferentes

categorías atendiendo a los hechos más

  • notables. Desde el tipo de suceso (evento

criminal, fuego amigo, ...

MT Decipherment without Parallel Data

New

English-to-Spanish Translation Model

P(e) e

English

P(f | e)

English Language Model

?

key

slide-21
SLIDE 21

f

Spanish corpus

El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha

  • rdenado la documentación por diferentes

categorías atendiendo a los hechos más

  • notables. Desde el tipo de suceso (evento

criminal, fuego amigo, ...

MT Decipherment without Parallel Data

New

(CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,....

English corpus

Language Model Training

English-to-Spanish Translation Model

P(e) e

English

P(f | e)

English Language Model

?

key

slide-22
SLIDE 22

f

Spanish corpus

El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha

  • rdenado la documentación por diferentes

categorías atendiendo a los hechos más

  • notables. Desde el tipo de suceso (evento

criminal, fuego amigo, ...

MT Decipherment without Parallel Data

New

For each f

  • alignments = hidden
  • e translation = hidden

(CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,....

English corpus

Language Model Training

English-to-Spanish Translation Model

P(e) e

English

P(f | e)

English Language Model

?

key

slide-23
SLIDE 23

f

Spanish corpus

El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha

  • rdenado la documentación por diferentes

categorías atendiendo a los hechos más

  • notables. Desde el tipo de suceso (evento

criminal, fuego amigo, ...

TRAINING

Train parameters θ to maximize probability of

  • bserved foreign text f:

argmax θ Pθ (f ) ≈ argmax θ ∑e Pθ (e, f) ≈ argmax θ ∑e P(e) . Pθ (f | e)

MT Decipherment without Parallel Data

New

For each f

  • alignments = hidden
  • e translation = hidden

(CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,....

English corpus

Language Model Training

English-to-Spanish Translation Model

P(e) e

English

P(f | e)

English Language Model

?

key

slide-24
SLIDE 24

9

Determinism in Key? Insertion Deletion Linguistic unit of substitution Transposition Scale

(re-ordering) (vocabulary & data sizes)

MT

Characteristics of Decipherment Key for MT

slide-25
SLIDE 25

10

Determinism in Key? Insertion Deletion Linguistic unit of substitution Transposition Scale

(re-ordering) (vocabulary & data sizes)

MT

many-to-many Word / Phrase Large

(100 - 1M word types)

Characteristics of Decipherment Key for MT

Hard problem!

slide-26
SLIDE 26

11

Determinism in Key? Insertion Deletion Linguistic unit of substitution Transposition Scale

(re-ordering) (vocabulary & data sizes)

MT

many-to-many Word / Phrase Large

(100 - 1M word types)

Word Substitution

1-to-1 Word Large

(100 - 1M word types)

Hard problem! Tackle a simpler problem first!

Characteristics of Decipherment Key

slide-27
SLIDE 27

12

Rest of this Talk

  • Introduction
  • Related Work
  • New Idea for Language Translation
  • Word Substitution
  • Foreign Language as a Cipher
  • Conclusion
slide-28
SLIDE 28

13

Word Substitution

(on the road to MT)

!"#$%&'()%$"*)%)#+%*,%-,./*)"0%

1*2(%3,)34(56*)(&%789%$#..*,.:%(;<(4$%$"(5(%#5(%"3,&5(&)%'=% $"'3)#,&)%'=%>$#.)?%4(5%<*4"(5%$'2(,%@$#.%A%-,./*)"%B'5&C%

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

Word Substitution

English words masked by cipher symbols

slide-29
SLIDE 29

14

!"#$%&'()%$"*)%)#+%*,%-,./*)"0%

1*2(%3,)34(56*)(&%789%$#..*,.:%(;<(4$%$"(5(%#5(%"3,&5(&)%'=% $"'3)#,&)%'=%>$#.)?%4(5%<*4"(5%$'2(,%@$#.%A%-,./*)"%B'5&C%

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

!"#$%

!"#$%&'(

Word Substitution

(on the road to MT)

Learn substitution mappings between English words and cipher symbols

Word Substitution

slide-30
SLIDE 30

15

Word Substitution Decipherment

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

!"#$%&'%(') *)

Word Substitution

slide-31
SLIDE 31

15

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Word Substitution Decipherment

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

!"#$%&'%(') *)

Word Substitution

slide-32
SLIDE 32

15

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

English Language Model Substitution Table

Word Substitution Decipherment

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

!"#$%&'%(') *)

Word Substitution

slide-33
SLIDE 33

15

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

English Language Model Substitution Table

LM Training

English corpus

Word Substitution Decipherment

95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 ... ...

!"#$%&'%(') *)

Word Substitution

slide-34
SLIDE 34

16

Word Substitution Decipherment

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

!"#$%&'%(') *)

Word Substitution

slide-35
SLIDE 35

16

Word Substitution Decipherment

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

!"#$%&'%(') *)

key contains millions of parameters!!!

Word Substitution

slide-36
SLIDE 36

16

  • 1. EM Decipherment
  • 2. Bayesian Decipherment
  • EM using new iterative training

procedure

New New

  • Bayesian inference (efficient,

parallelized sampling)

Word Substitution Decipherment

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

!"#$%&'%(') *)

key contains millions of parameters!!!

Word Substitution

slide-37
SLIDE 37

17

Word Substitution: (1) EM

  • !"#$%&'(#)*%+#

– ,-*.)/###

  • &))0#(1#%(12)#3#4-0*()#567#-*2*7)()2%#

– 8$7)/###

  • 9$:)#;<,#(*==$&=>#?4(#56:#-1%%$?@)#A(*=%B#-)2#.$-C)2#(1:)&#

– ,1@4D1&/#

  • E4$@0#FGF#H#FGF#.C*&&)@#IJ$(C#KLM#J120N#
  • !"#*%%$=&%#KLM#(1#%17)#.$-C)2#(1:)&%#
  • !@$7$&*()#-*2*7)()2%#
  • !H-*&0#(1#5GF#H#5GF>#)(.O#

Iterative EM

New

Word Substitution

  • Decoding: Viterbi decoding using trained channel (details in paper)
slide-38
SLIDE 38

18

Word Substitution: (2) Bayesian

  • Several advantages over EM

✓ efficient inference, even with higher-order LMs

➡ incremental scoring of derivations during sampling

✓ novel sample-selection strategy permits fast training ✓ no memory bottleneck ✓ sparse priors help learn skewed distributions

New

Word Substitution

slide-39
SLIDE 39

19

Word Substitution: (2) Bayesian

  • Same generative story as EM, replace models with Chinese

Restaurant Process (CRP) formulations

Base distributions (P0): source = English LM probabilities, channel = uniform

Priors: source (α) = 104, channel (β) = 0.01

  • Inference via point-wise Gibbs sampling

Smart sample-choice selection

(continued)

Word Substitution

slide-40
SLIDE 40

20

cipher: 22 94 43 04 98 current: three eight living here ? resample: three the living here ? resample: three a living here ?! resample: three and living here ?! resample: three boys living here ? resample: three gun living here ? resample: three brick living here ? resample: three ran living here ? ... !"#$%&'()*+,)-("#$!$%&'()*+,-.!/0,'!1,*0%$!/#2%-3! 40#5'(-6'!#$!),++,#-'!#"!1,*0%$!/#2%-'!*%$!%*#107! 85$$%-/!'#+59#-:!!'()*+%!+./'(0+1(233(*+(,-/%;/!10#,1%'! <=,-%!1#)*5/(9#-:!!>,?%-!@!(-6!AB!C0(/!($%!/#*&DEE!F!$(-2%6!GH!IJ@!F!AKL! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!M"!@!(-6!A!-%?%$!1#&#115$$%6B!/0%-!*,12!DEE!$(-6#)!C#$6'7!

Smart Sample-Choice Selection for Bayesian Decipherment

Word Substitution

slide-41
SLIDE 41

21

Word Substitution: (2) Bayesian

  • Same generative story as EM, replace models with Chinese

Restaurant Process (CRP) formulations

Base distributions (P0): source = English LM probabilities, channel = uniform

Priors: source (α) = 104, channel (β) = 0.01

  • Inference via point-wise Gibbs sampling

Smart sample-choice selection

Parallelized sampling using Map Reduce (3 to 5-fold faster)

  • Decoding: Extract trained channel from final sample, Viterbi-decode

(details in paper)

(continued)

Word Substitution

slide-42
SLIDE 42

22

Word Substitution Results

Word Substitution

slide-43
SLIDE 43

23

Sample Decipherments

Word Substitution

Cipher Original English Deciphered

slide-44
SLIDE 44

24

Rest of this Talk

  • Introduction
  • Related Work
  • New Idea for Language Translation
  • Word Substitution
  • Foreign Language as a Cipher
  • Conclusion
slide-45
SLIDE 45

25

!"#$%&'()$*)( +(

Foreign Language as a Cipher

Machine Translation without parallel data

Spanish corpus

slide-46
SLIDE 46

25

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Foreign Language as a Cipher

Machine Translation without parallel data

Spanish corpus

slide-47
SLIDE 47

25

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Foreign Language as a Cipher

Machine Translation without parallel data

Spanish corpus

English Language Model English-to-Foreign Translation Model

slide-48
SLIDE 48

25

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Foreign Language as a Cipher

Machine Translation without parallel data

Spanish corpus

English Language Model English-to-Foreign Translation Model

LM Training

English corpus

slide-49
SLIDE 49

25

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Foreign Language as a Cipher

Machine Translation without parallel data

Spanish corpus

English Language Model English-to-Foreign Translation Model

LM Training

English corpus

W

  • r

d S u b s t i t u t i

  • n

Machine Translation without parallel data

+ transposition + insertion + deletion + ...

=

slide-50
SLIDE 50

26

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Deciphering Foreign Language

Machine Translation without parallel data

  • P(f | e) can be any translation model used in MT

but without parallel data, training is intractable

(e.g., IBM Model 3)

slide-51
SLIDE 51

26

!"#$%&'()$*)( +(

!"#$%$&'$ !"&'$ &$ ()*+,-.$

/0&12$

?

Deciphering Foreign Language

Machine Translation without parallel data

  • P(f | e) can be any translation model used in MT

but without parallel data, training is intractable

(e.g., IBM Model 3)

  • 1. EM Decipherment
  • 2. Bayesian Decipherment
  • Simple generative story
  • model can be trained efficiently

using EM

New New

  • IBM Model 3 generative story
  • Bayesian inference (efficient,

parallelized sampling)

slide-52
SLIDE 52

27

MT Decipherment: (1) EM

Machine Translation without parallel data

  • Model can be trained efficiently using EM
  • Use prior linguistic knowledge for decipherment

morphology, linguistic constraints (e.g., “8” in English maps to “8” in Spanish)

  • Whole-segment language models instead of word n-gram LMs

✓ word substitutions ✓ insertions ✓ deletions ✓ local re-ordering

  • Simple generative story (avoid large-sized fertility, distortion tables)

New

e.g., ARE YOU TALKING ABOUT ME ? THANK YOU TALKING ABOUT ?

slide-53
SLIDE 53

28

MT Decipherment: (2) Bayesian

Machine Translation without parallel data

  • Generative story: IBM Model 3 (popular)

translation (word substitution) fertility distortion (transposition)

New

slide-54
SLIDE 54

28

MT Decipherment: (2) Bayesian

Machine Translation without parallel data

  • Complex model, makes inference very hard
  • New translation model

replace IBM Model 3 components with CRP processes

  • Generative story: IBM Model 3 (popular)

translation (word substitution) fertility distortion (transposition)

New

slide-55
SLIDE 55

28

MT Decipherment: (2) Bayesian

Machine Translation without parallel data

  • Complex model, makes inference very hard
  • New translation model

replace IBM Model 3 components with CRP processes

  • Generative story: IBM Model 3 (popular)

translation (word substitution) fertility distortion (transposition) CRP cache model

New

slide-56
SLIDE 56

29

MT Decipherment: (2) Bayesian

Machine Translation without parallel data

  • Bayesian inference for estimating translation model

➡ efficient, scalable inference using strategies described earlier

  • Sampling IBM Model 3

➡ point-wise Gibbs sampling: for each foreign string f, jointly

sample alignments, e translations

➡ sampling operators = translate 1 word, swap alignments, ...

(similar to German et al., 2001)

➡ blocked sampling: sample single derivation for repeating

sentences

  • Choose the final sample as MT decipherment output

(continued)

slide-57
SLIDE 57

... 10 días consecutivos de cotización 10 semanas consecutivas 100 años después ... 17 de abril 1986 ... años cuarto puesto enero alrededor. 28 ... mil años antes mil años ...

  • tros 12 meses más o menos

... una de tres horas uno de tres años un jueves por la noche Un día hace poco ...

30

Time Expressions

Machine Translation without parallel data

Spanish corpus

... 10 MONTHS LATER 10 MORE YEARS 24 MINUTES 28 CONSECUTIVE QUARTERS ... A WEEK EARLIER ABOUT A DECADE AGO ABOUT A MONTH AFTER ... AUGUST 6 , 1789 ... CENTURIES AGO DEC . 11 , 1989 ... TWO DAYS LATER TWO DECADES LATER TWO FULL DAYS ... YEARS ...

English corpus

slide-58
SLIDE 58

... 10 días consecutivos de cotización 10 semanas consecutivas 100 años después ... 17 de abril 1986 ... años cuarto puesto enero alrededor. 28 ... mil años antes mil años ...

  • tros 12 meses más o menos

... una de tres horas uno de tres años un jueves por la noche Un día hace poco ...

30

Time Expressions

Machine Translation without parallel data

Spanish corpus

... 10 MONTHS LATER 10 MORE YEARS 24 MINUTES 28 CONSECUTIVE QUARTERS ... A WEEK EARLIER ABOUT A DECADE AGO ABOUT A MONTH AFTER ... AUGUST 6 , 1789 ... CENTURIES AGO DEC . 11 , 1989 ... TWO DAYS LATER TWO DECADES LATER TWO FULL DAYS ... YEARS ...

English corpus

25 50 75 100

48.7 18.2

Baseline without parallel data Decipherment without parallel data BLEU score

↑Higher is better

Results

88

MOSES with parallel data

slide-59
SLIDE 59

31

Movie Subtitles

Machine Translation without parallel data

... ALL RIGHT , LET' S GO . ARE YOU ALL RIGHT ? ARE YOU CRAZY ? ... HEY , DO YOU WANT TO COME OUT AND PLAY THE GAME ? ... WHAT ARE YOU DOING HERE ? ... YEAH ! YOU KNOW WHAT I MEAN ? ...

English corpus

... abran la puerta . bien hecho . ... ¡ por aquí ! ¿ a qué te refieres ? ¿ cómo podré verlos a través de mis lágrimas ?

  • ye , ¿ quieres salir y jugar el juego ?

... un segundo . vamonos . ...

Spanish corpus

OPUS Spanish/English corpus [ Tiedemann, 2009 ]

slide-60
SLIDE 60

31

Movie Subtitles

Machine Translation without parallel data

... ALL RIGHT , LET' S GO . ARE YOU ALL RIGHT ? ARE YOU CRAZY ? ... HEY , DO YOU WANT TO COME OUT AND PLAY THE GAME ? ... WHAT ARE YOU DOING HERE ? ... YEAH ! YOU KNOW WHAT I MEAN ? ...

English corpus

... abran la puerta . bien hecho . ... ¡ por aquí ! ¿ a qué te refieres ? ¿ cómo podré verlos a través de mis lágrimas ?

  • ye , ¿ quieres salir y jugar el juego ?

... un segundo . vamonos . ...

Spanish corpus

OPUS Spanish/English corpus [ Tiedemann, 2009 ]

25 50 75 100

19.3

Decipherment without parallel data BLEU score

↑Higher is better

Results

MOSES with parallel data 63.6

slide-61
SLIDE 61

32

MT Accuracy vs. Data Size

Phrase-based MT, parallel data IBM Model 3 - distortion, parallel data

MT quality

  • n test set

EM Decipherment, no parallel data

Machine Translation without parallel data

slide-62
SLIDE 62

32

MT Accuracy vs. Data Size

Phrase-based MT, parallel data IBM Model 3 - distortion, parallel data

MT quality

  • n test set

EM Decipherment, no parallel data

Machine Translation without parallel data

slide-63
SLIDE 63

33

Conclusion

  • Language translation without parallel data

➡ very challenging task, but shown to be possible! (using

decipherment approach)

➡ initial results promising ➡ can easily extend to new language pairs, domains

  • Future Work

Scalable decipherment methods for full-scale MT

Better unsupervised algorithms for decipherment

Leverage existing bilingual resources (e.g., dictionaries, etc.) during decipherment

Applications for domain adaptation

slide-64
SLIDE 64

What else can Decipherment Do?

34

Language Translation

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

This talk

slide-65
SLIDE 65

What else can Decipherment Do?

34

Language Translation

Spanish text Monolingual corpora TRAIN associates/ asociados : 0.8 : : :

Translation tables

English text

This talk Afternoon talk

(2pm, Machine Learning Session 2-B)

Cryptanalysis

CATCH A SERIAL KILLER

slide-66
SLIDE 66

THANK YOU!

35

NLP