Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP - - PowerPoint PPT Presentation

machine translation
SMART_READER_LITE
LIVE PREVIEW

Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP - - PowerPoint PPT Presentation

Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 28, 2017 Based on slides from Jason Eisenstein, Chris Dyer, Alan Ritter, Yejin Choi, and everyone else they copied from. Upcoming Paper summaries:


slide-1
SLIDE 1

Machine Translation

  • Prof. Sameer Singh

CS 295: STATISTICAL NLP WINTER 2017

February 28, 2017

Based on slides from Jason Eisenstein, Chris Dyer, Alan Ritter, Yejin Choi, and everyone else they copied from.

slide-2
SLIDE 2

Upcoming…

  • Homework 4 is due on March 13
  • Write-up and data releasing soon.

Homework

  • Status report due in 1 weeks: March 7, 2017
  • Instructions coming today!
  • Almost final report, only 5 pages

Project

  • Paper summaries: February 28, March 14
  • Summary 1 graded

Summaries

CS 295: STATISTICAL NLP (WINTER 2017) 2

slide-3
SLIDE 3

Outline

Machine Translation Introduction to Statistical MT IBM Translation Models

CS 295: STATISTICAL NLP (WINTER 2017) 3

slide-4
SLIDE 4

Outline

Machine Translation Introduction to Statistical MT IBM Translation Models

CS 295: STATISTICAL NLP (WINTER 2017) 4

slide-5
SLIDE 5

Machine Translation

CS 295: STATISTICAL NLP (WINTER 2017) 5

I have always imagined Paradise as a kind of library. Yo, que me figuraba el Paraíso / Bajo la especie de una biblioteca.

slide-6
SLIDE 6

Challenges: Word Order

CS 295: STATISTICAL NLP (WINTER 2017) 6

Even for SVO English: I will buy it French: Je vais l’acheter (I will it buy) English: I bought it French: Je l’ai achet´ e (I it have bought) SVO vs SOV English: IBM bought Lotus Japanese: IBM Lotus bought

slide-7
SLIDE 7

Challenges: Lexical Ambiguity

CS 295: STATISTICAL NLP (WINTER 2017) 7

bill pico cuenta

slide-8
SLIDE 8

Challenges: Pronouns

CS 295: STATISTICAL NLP (WINTER 2017) 8

In Spanish, you can recover the pronoun from verb inflection:

Vivimos en Atlanta → We live in Atlanta

I Again, discourse context is often crucial:

Vive en Atlanta → She/he/it lives in Atlanta

English possessive pronouns take the gender of the owner:

Marie rides her bike

French possessive pronouns take the gender of the object:

Marie monte sur son vélo

Different Pronouns Dropping Pronouns

slide-9
SLIDE 9

Challenges: Tenses

CS 295: STATISTICAL NLP (WINTER 2017) 9

The preterite tense is for events with a definite time, e.g.

I biked to work this morning

The imperfect is for events with indefinite times, e.g.

I biked to work all last summer

To translate English to Spanish, we must pick the right tense.

slide-10
SLIDE 10

Challenges: Idioms

CS 295: STATISTICAL NLP (WINTER 2017) 10

Why in the world Kick the bucket Lend me your ears Dead As A Doornail As Cool As a Cucumber Hold Your Horses Storm in a Teacup Bob's Your Uncle Blue in the Face Head In The Clouds

slide-11
SLIDE 11

Rules for Machine Translation

CS 295: STATISTICAL NLP (WINTER 2017) 11

Rules for translating much or many into Russian: if preceding word is how return skol’ko else if preceding word is as return stol’ko zhe else if word is much if preceding word is very return nil else if following word is a noun return mnogo else (word is many) if preceding word is a preposition and following word is noun return mnogii else return mnogo

Panov (1960)

slide-12
SLIDE 12

The Vauquios Triangle

CS 295: STATISTICAL NLP (WINTER 2017) 12

slide-13
SLIDE 13

Outline

Machine Translation Introduction to Statistical MT IBM Translation Models

CS 295: STATISTICAL NLP (WINTER 2017) 13

slide-14
SLIDE 14

Statistical Machine Translation

CS 295: STATISTICAL NLP (WINTER 2017) 14

slide-15
SLIDE 15

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 15

slide-16
SLIDE 16

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 16

slide-17
SLIDE 17

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 17

slide-18
SLIDE 18

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 18

slide-19
SLIDE 19

The Rosetta Stone

CS 295: STATISTICAL NLP (WINTER 2017) 19

slide-20
SLIDE 20

Warren Weaver (1949)

CS 295: STATISTICAL NLP (WINTER 2017) 20

slide-21
SLIDE 21

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 21

slide-22
SLIDE 22

Parallel Corpus: Examples

CS 295: STATISTICAL NLP (WINTER 2017) 22

slide-23
SLIDE 23

Noisy Channel Model

CS 295: STATISTICAL NLP (WINTER 2017) 23

“Noisy Channel” Decoder

slide-24
SLIDE 24

Noisy Channel Model

CS 295: STATISTICAL NLP (WINTER 2017) 24

“Noisy Channel” Decoder

slide-25
SLIDE 25

Example: Noisy Channel

CS 295: STATISTICAL NLP (WINTER 2017) 25

slide-26
SLIDE 26

Example: Noisy Channel

CS 295: STATISTICAL NLP (WINTER 2017) 26

slide-27
SLIDE 27

Components of an MT system

CS 295: STATISTICAL NLP (WINTER 2017) 27

Language Model Translation Model Decoding Algo

slide-28
SLIDE 28

Components of an MT system

CS 295: STATISTICAL NLP (WINTER 2017) 28

slide-29
SLIDE 29

Evaluating MT

CS 295: STATISTICAL NLP (WINTER 2017) 29

slide-30
SLIDE 30

Human Evaluation

CS 295: STATISTICAL NLP (WINTER 2017) 30

Fluency Adequacy A: furious nAgA on wednesday , the tribal minimum pur of ten schools also was burnt B: furious nAgA on wednesday the tribal pur mini ten schools of them was also burnt

slide-31
SLIDE 31

Automated Evaluation

CS 295: STATISTICAL NLP (WINTER 2017) 31

Fluency Adequacy

slide-32
SLIDE 32

BLEU Score

CS 295: STATISTICAL NLP (WINTER 2017) 32

slide-33
SLIDE 33

BLEU Score: Example

CS 295: STATISTICAL NLP (WINTER 2017) 33

‘ extension of isi in uttar pradesh ’ ‘ isi ’s expansion in uttar pradesh ’ ‘ the spread of isi in uttar pradesh ’ ‘ isi spreading in uttar pradesh ’ the spread of isi in uttar pradesh

slide-34
SLIDE 34

BLEU Score: Example

CS 295: STATISTICAL NLP (WINTER 2017) 34

‘ extension of isi in uttar pradesh ’ ‘ isi ’s expansion in uttar pradesh ’ ‘ the spread of isi in uttar pradesh ’ ‘ isi spreading in uttar pradesh ’ the spread of isi in uttar pradesh

slide-35
SLIDE 35

BLEU’s not bad…

CS 295: STATISTICAL NLP (WINTER 2017) 35

  • G. Doddington, NIST
slide-36
SLIDE 36

Outline

Machine Translation Introduction to Statistical MT IBM Translation Models

CS 295: STATISTICAL NLP (WINTER 2017) 36

slide-37
SLIDE 37

Statistical Translation Model

CS 295: STATISTICAL NLP (WINTER 2017) 37

And the program was implemented La programmation a été mise en application

slide-38
SLIDE 38

Word Alignment: Direct

CS 295: STATISTICAL NLP (WINTER 2017) 38

slide-39
SLIDE 39

Word Alignment: 1-to-Many

CS 295: STATISTICAL NLP (WINTER 2017) 39

slide-40
SLIDE 40

Word Alignment: Reordering

CS 295: STATISTICAL NLP (WINTER 2017) 40

slide-41
SLIDE 41

Word Alignment: Inserting

CS 295: STATISTICAL NLP (WINTER 2017) 41

slide-42
SLIDE 42

Word Alignment: Dropping

CS 295: STATISTICAL NLP (WINTER 2017) 42

slide-43
SLIDE 43

Translating with Alignments

CS 295: STATISTICAL NLP (WINTER 2017) 43

slide-44
SLIDE 44

Example: Translation Prob

CS 295: STATISTICAL NLP (WINTER 2017) 44

slide-45
SLIDE 45

IBM Models

CS 295: STATISTICAL NLP (WINTER 2017) 45

Model 1 Model 2 Model 3/4/5

slide-46
SLIDE 46

Word Alignment Algorithm

CS 295: STATISTICAL NLP (WINTER 2017) 46