Statistical Machine Translation What works and what does not - - PowerPoint PPT Presentation

statistical machine translation
SMART_READER_LITE
LIVE PREVIEW

Statistical Machine Translation What works and what does not - - PowerPoint PPT Presentation

Statistical Machine Translation What works and what does not Andreas Maletti Universitt Stuttgart maletti@ims.uni-stuttgart.de Stuttgart May 14, 2013 Statistical Machine Translation A. Maletti 1 Main notions Machine translation


slide-1
SLIDE 1

Statistical Machine Translation

What works and what does not Andreas Maletti

Universität Stuttgart maletti@ims.uni-stuttgart.de

Stuttgart — May 14, 2013

Statistical Machine Translation

  • A. Maletti

· 1

slide-2
SLIDE 2

Main notions

Machine translation (MT)

Automatic natural language translation (by a computer) as opposed to: manual translation computer-aided translation (e.g., translation memory)

Statistical machine translation (SMT)

MT using systems automatically obtained from (many) translations as opposed to: rule-based machine translation (old) SYSTRAN example-based machine translation translation by analogy

Statistical Machine Translation

  • A. Maletti

· 2

slide-3
SLIDE 3

Main notions

Machine translation (MT)

Automatic natural language translation (by a computer) as opposed to: manual translation computer-aided translation (e.g., translation memory)

Statistical machine translation (SMT)

MT using systems automatically obtained from (many) translations as opposed to: rule-based machine translation (old) SYSTRAN example-based machine translation translation by analogy

Statistical Machine Translation

  • A. Maletti

· 2

slide-4
SLIDE 4

Short history

Timeline

1

Dark age (60s–90s)

◮ rule-based systems (e.g., SYSTRAN) ◮ CHOMSKYAN approach ◮ perfect translation, poor coverage 2

Reformation (1991–present)

◮ phrase-based and syntax-based systems ◮ statistical approach ◮ cheap, automatically trained 3

Potential future

◮ semantics-based systems (e.g., FRAMENET-based) ◮ semi-supervised, statistical approach ◮ basic understanding of translated text Statistical Machine Translation

  • A. Maletti

· 3

slide-5
SLIDE 5

Short history

Timeline

1

Dark age (60s–90s)

◮ rule-based systems (e.g., SYSTRAN) ◮ CHOMSKYAN approach ◮ perfect translation, poor coverage 2

Reformation (1991–present)

◮ phrase-based and syntax-based systems ◮ statistical approach ◮ cheap, automatically trained 3

Potential future

◮ semantics-based systems (e.g., FRAMENET-based) ◮ semi-supervised, statistical approach ◮ basic understanding of translated text Statistical Machine Translation

  • A. Maletti

· 3

slide-6
SLIDE 6

Short history

Timeline

1

Dark age (60s–90s)

◮ rule-based systems (e.g., SYSTRAN) ◮ CHOMSKYAN approach ◮ perfect translation, poor coverage 2

Reformation (1991–present)

◮ phrase-based and syntax-based systems ◮ statistical approach ◮ cheap, automatically trained 3

Potential future

◮ semantics-based systems (e.g., FRAMENET-based) ◮ semi-supervised, statistical approach ◮ basic understanding of translated text Statistical Machine Translation

  • A. Maletti

· 3

slide-7
SLIDE 7

Standard pipeline

Schema

Input − → Translation model − → Language model − → Output

(the models are often integrated in practice)

Required resources

bilingual text (sentences in both languages) 1.5M sent. monolingual text (in target language) 44M sent.

Statistical Machine Translation

  • A. Maletti

· 4

slide-8
SLIDE 8

Standard pipeline

Schema

Input − → Translation model − → Language model − → Output

(the models are often integrated in practice)

Required resources

bilingual text (sentences in both languages) 1.5M sent. monolingual text (in target language) 44M sent.

Statistical Machine Translation

  • A. Maletti

· 4

slide-9
SLIDE 9

Standard pipeline

Schema

Input − → Translation model − → Language model − → Output

(the models are often integrated in practice)

Required resources

bilingual text (sentences in both languages) 1.5M sent. monolingual text (in target language) 44M sent.

Statistical Machine Translation

  • A. Maletti

· 4

slide-10
SLIDE 10

Standard pipeline

Example (Source: GOOGLE translate)

Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert und was nicht Was am und was nicht funktioniert Was funktioniert am und welche nicht ist und was nicht

Statistical Machine Translation

  • A. Maletti

· 5

slide-11
SLIDE 11

Standard pipeline

Example (Source: GOOGLE translate)

Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert und was nicht Was am und was nicht funktioniert Was funktioniert am und welche nicht ist und was nicht

Statistical Machine Translation

  • A. Maletti

· 5

slide-12
SLIDE 12

Standard pipeline

Example (Source: GOOGLE translate)

Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert und was nicht Was am und was nicht funktioniert Was funktioniert am und welche nicht ist und was nicht

Statistical Machine Translation

  • A. Maletti

· 5

slide-13
SLIDE 13

Standard pipeline

Example (Source: GOOGLE translate)

Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert und was nicht Was am und was nicht funktioniert Was funktioniert am und welche nicht ist und was nicht

Statistical Machine Translation

  • A. Maletti

· 5

slide-14
SLIDE 14

Phrase-based machine translation

And then the matter was decided , and everything was put in place f kAn An tm AlHsm w wDEt Almwr fy nSAb hA

Extracted information

Segmentation: And then the matter was decided , and everything was put in place Phrase translation: Reordering:

Statistical Machine Translation

  • A. Maletti

· 6

slide-15
SLIDE 15

Phrase-based machine translation

And then the matter was decided , and everything was put in place f kAn An tm AlHsm w wDEt Almwr fy nSAb hA

Extracted information

Segmentation: And then the matter was decided , and everything was put in place Phrase translation: Reordering:

Statistical Machine Translation

  • A. Maletti

· 6

slide-16
SLIDE 16

Phrase-based machine translation

And then the matter was decided , and everything was put in place f kAn An tm AlHsm w wDEt Almwr fy nSAb hA

Extracted information

Segmentation:

And then

1 the matter 2 was decided 3 , and everything 4 was put 5 in place 6

Phrase translation: Reordering:

Statistical Machine Translation

  • A. Maletti

· 6

slide-17
SLIDE 17

Phrase-based machine translation

And then the matter was decided , and everything was put in place f kAn An tm AlHsm w wDEt Almwr fy nSAb hA

Extracted information

Segmentation:

And then

1 the matter 2 was decided 3 , and everything 4 was put 5 in place 6

Phrase translation:

f kAn

1 Almwr 2 An tm AlHsm 3 w 4 wDEt 5 fy nSAb hA 6

Reordering:

Statistical Machine Translation

  • A. Maletti

· 6

slide-18
SLIDE 18

Phrase-based machine translation

And then the matter was decided , and everything was put in place f kAn An tm AlHsm w wDEt Almwr fy nSAb hA

Extracted information

Segmentation:

And then

1 the matter 2 was decided 3 , and everything 4 was put 5 in place 6

Phrase translation:

f kAn

1 Almwr 2 An tm AlHsm 3 w 4 wDEt 5 fy nSAb hA 6

Reordering: (1 3 4 5 2 6)

Statistical Machine Translation

  • A. Maletti

· 6

slide-19
SLIDE 19

How it works

Technical talks

Marion Weller phrase-based MT Daniel Quernheim and Nina Seemann syntax-based MT

Statistical Machine Translation

  • A. Maletti

· 7

slide-20
SLIDE 20

Small players

Research at IMS

Phrase-based MT (head: Dr. Alexander Fraser)

◮ Fabienne Braune ◮ Fabienne Cap ◮ Anita Ramm ◮ Marion Weller

Syntax-based MT (head: Dr. Andreas Maletti)

◮ Fabienne Braune ◮ Daniel Quernheim ◮ Nina Seemann Statistical Machine Translation

  • A. Maletti

· 8

slide-21
SLIDE 21

Small players

Research at IMS

Phrase-based MT (head: Dr. Alexander Fraser)

◮ Fabienne Braune ◮ Fabienne Cap ◮ Anita Ramm ◮ Marion Weller

Syntax-based MT (head: Dr. Andreas Maletti)

◮ Fabienne Braune ◮ Daniel Quernheim ◮ Nina Seemann Statistical Machine Translation

  • A. Maletti

· 8

slide-22
SLIDE 22

Small players

Research at IMS

Phrase-based MT (head: Dr. Alexander Fraser)

◮ Fabienne Braune ◮ Fabienne Cap ◮ Anita Ramm ◮ Marion Weller

Syntax-based MT (head: Dr. Andreas Maletti)

◮ Fabienne Braune ◮ Daniel Quernheim ◮ Nina Seemann Statistical Machine Translation

  • A. Maletti

· 8

slide-23
SLIDE 23

Big players

Commercial systems

Language Studio GOOGLE translate WebSphere Translation Server BING translator OMNIFLUENT . . .

Statistical Machine Translation

  • A. Maletti

· 9

slide-24
SLIDE 24

Big players

Commercial systems

Language Studio GOOGLE translate WebSphere Translation Server BING translator OMNIFLUENT . . . Soon also

Statistical Machine Translation

  • A. Maletti

· 9

slide-25
SLIDE 25

Failures

Statistical Machine Translation

  • A. Maletti

· 10

slide-26
SLIDE 26

Failures

Applications

Technical manuals

Example (An mp3 player)

The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top , keep etc. edit function.

Statistical Machine Translation

  • A. Maletti

· 11

slide-27
SLIDE 27

Failures

Applications

Technical manuals

Example (An mp3 player)

The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top , keep etc. edit function.

Statistical Machine Translation

  • A. Maletti

· 11

slide-28
SLIDE 28

Failures

Applications

Technical manuals

Example (An mp3 player)

The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top , keep etc. edit function.

Statistical Machine Translation

  • A. Maletti

· 11

slide-29
SLIDE 29

Failures

Applications

Technical manuals

Example (Hotel Uppsala, Sweden)

Wir hatten die Zimmer eingestuft wird als “Superior” weil sie renoviert wurde im letzten Jahr oder zwei. Unsere Zimmer hatten Parkettboden und waren sehr geräumig. Man musste allerdings nicht musste seitwärts bewegen.

Statistical Machine Translation

  • A. Maletti

· 11

slide-30
SLIDE 30

Failures

Applications

Technical manuals

Example (Hotel Uppsala, Sweden)

Wir hatten die Zimmer eingestuft wird als “Superior” weil sie renoviert wurde im letzten Jahr oder zwei. Unsere Zimmer hatten Parkettboden und waren sehr geräumig. Man musste allerdings nicht musste seitwärts bewegen. — We stayed in rooms classified as “superior” because they had been renovated in the last year or two. Our rooms had wood floors and were roomy. You didn’t have to walk sideways to move around.

Statistical Machine Translation

  • A. Maletti

· 11

slide-31
SLIDE 31

Failures

Applications

Technical manuals US military

Example (JONES, SHEN, HERZOG 2009)

Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran.

Statistical Machine Translation

  • A. Maletti

· 11

slide-32
SLIDE 32

Failures

Applications

Technical manuals US military

Example (JONES, SHEN, HERZOG 2009)

Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran. Speech-to-text machine translation Soldier: Okay, what’s your name? Local: milk a mechanic and I am here I mean yes

Statistical Machine Translation

  • A. Maletti

· 11

slide-33
SLIDE 33

Failures

Applications

Technical manuals US military

Example (JONES, SHEN, HERZOG 2009)

Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran. Speech-to-text machine translation Soldier: Okay, what’s your name? Local: milk a mechanic and I am here I mean yes Soldier: What is your last name? Local: every two weeks my son’s name is ismail

Statistical Machine Translation

  • A. Maletti

· 11

slide-34
SLIDE 34

Failures

Applications

Technical manuals US military MSDN, Knowledge Base . . .

Statistical Machine Translation

  • A. Maletti

· 11

slide-35
SLIDE 35

But in many cases it actually works . . .

Statistical Machine Translation

  • A. Maletti

· 12

slide-36
SLIDE 36

Selected application

Lecture translation

real-time speech-to-text machine translation combines automatic speech recognition and SMT requires lecturer training and terminology training automatically provides subtitles to lecture video

Video

http://www.youtube.com/watch?v=x5lL0wpr-88

Statistical Machine Translation

  • A. Maletti

· 13

slide-37
SLIDE 37

Selected application

Lecture translation

real-time speech-to-text machine translation combines automatic speech recognition and SMT requires lecturer training and terminology training automatically provides subtitles to lecture video

Video

http://www.youtube.com/watch?v=x5lL0wpr-88

Statistical Machine Translation

  • A. Maletti

· 13

slide-38
SLIDE 38

Summary

SMT works well

between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) → access to foreign language

SMT could be better

into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) → precision / translation accuracy

Conclusion

SMT is a cheap way to access foreign material

Statistical Machine Translation

  • A. Maletti

· 14

slide-39
SLIDE 39

Summary

SMT works well

between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) → access to foreign language

SMT could be better

into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) → precision / translation accuracy

Conclusion

SMT is a cheap way to access foreign material

Statistical Machine Translation

  • A. Maletti

· 14

slide-40
SLIDE 40

Summary

SMT works well

between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) → access to foreign language

SMT could be better

into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) → precision / translation accuracy

Conclusion

SMT is a cheap way to access foreign material

Statistical Machine Translation

  • A. Maletti

· 14

slide-41
SLIDE 41

Summary

SMT works well

between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) → access to foreign language

SMT could be better

into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) → precision / translation accuracy

Conclusion

SMT is a cheap way to access foreign material

Statistical Machine Translation

  • A. Maletti

· 14

slide-42
SLIDE 42

Summary

SMT works well

between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) → access to foreign language

SMT could be better

into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) → precision / translation accuracy

Conclusion

SMT is a cheap way to access foreign material

Statistical Machine Translation

  • A. Maletti

· 14