Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - - PowerPoint PPT Presentation

alignment in
SMART_READER_LITE
LIVE PREVIEW

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - - PowerPoint PPT Presentation

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok


slide-1
SLIDE 1

Alignment in Machine Translation

CMSC 723 / LING 723 / INST 725 MARINE CARPUAT

marine@cs.umd.edu

slide-2
SLIDE 2

Centauri/Arcturan [Knight, 1997]

  • 1a. ok-voon ororok sprok .
  • 1b. at-voon bichat dat .
  • 7a. lalok farok ororok lalok sprok izok

enemok .

  • 7b. wat jjat bichat wat dat vat eneat .
  • 2a. ok-drubel ok-voon anok plok

sprok .

  • 2b. at-drubel at-voon pippat rrat dat .
  • 8a. lalok brok anok plok nok .
  • 8b. iat lat pippat rrat nnat .
  • 3a. erok sprok izok hihok ghirok .
  • 3b. totat dat arrat vat hilat .
  • 9a. wiwok nok izok kantok ok-yurp .
  • 9b. totat nnat quat oloat at-yurp .
  • 4a. ok-voon anok drok brok jok .
  • 4b. at-voon krat pippat sat lat .
  • 10a. lalok mok nok yorok ghirok clok .
  • 10b. wat nnat gat mat bat hilat .
  • 5a. wiwok farok izok stok .
  • 5b. totat jjat quat cat .
  • 11a. lalok nok crrrok hihok yorok zanzanok .
  • 11b. wat nnat arrat mat zanzanat .
  • 6a. lalok sprok izok jok stok .
  • 6b. wat dat krat quat cat .
  • 12a. lalok rarok nok izok hihok mok .
  • 12b. wat nnat forat arrat vat gat .

Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp

slide-3
SLIDE 3

Centauri/Arcturian was actually Spanish/English…

  • 1a. Garcia and associates .
  • 1b. Garcia y asociados .
  • 7a. the clients and the associates are enemies .
  • 7b. los clients y los asociados son enemigos .
  • 2a. Carlos Garcia has three associates .
  • 2b. Carlos Garcia tiene tres asociados .
  • 8a. the company has three groups .
  • 8b. la empresa tiene tres grupos .
  • 3a. his associates are not strong .
  • 3b. sus asociados no son fuertes .
  • 9a. its groups are in Europe .
  • 9b. sus grupos estan en Europa .
  • 4a. Garcia has a company also .
  • 4b. Garcia tambien tiene una empresa .
  • 10a. the modern groups sell strong pharmaceuticals
  • 10b. los grupos modernos venden medicinas fuertes
  • 5a. its clients are angry .
  • 5b. sus clientes estan enfadados .
  • 11a. the groups do not sell zenzanine .
  • 11b. los grupos no venden zanzanina .
  • 6a. the associates are also angry .
  • 6b. los asociados tambien estan

enfadados .

  • 12a. the small groups are not modern .
  • 12b. los grupos pequenos no son modernos .

Translate: Clients do not sell pharmaceuticals in Europe.

slide-4
SLIDE 4

1947

When I look at an article in Russian, I say to myself: This is really written in English, but it has been coded in some strange

  • symbols. I will now

proceed to decode.

Warren Weaver

slide-5
SLIDE 5

1988

More about the IBM story: 20 years of bitext workshop

slide-6
SLIDE 6

Noisy Channel Model for Machine Translation

  • The noisy channel model decomposes machine

translation into two independent subproblems – Language modeling – Translation modeling / Alignment

slide-7
SLIDE 7

WO WORD AL D ALIGN GNME MENT NT

slide-8
SLIDE 8

How can we model p(f|e)?

  • We’ll describe the word alignment models

introduced in early 90s at IBM

  • Assumption: each French word f is aligned to

exactly one English word e

– Including NULL

slide-9
SLIDE 9

Word Alignment Vector Representation

  • Alignment vector a = [2,3,4,5,6,6,6]

– length of a = length of sentence f – ai = j if French position i is aligned to English position j

slide-10
SLIDE 10

Formalizing the connection between word alignments & the translation model

  • We define a conditional model

– Projecting word translations – Through alignment links

slide-11
SLIDE 11

How many possible alignments in A?

  • How many possible alignments for (f,e) where

– f is French sentence with m words – e is an English sentence with l words

  • For each of m French words, we choose an

alignment link among (l+1) English words

  • Answer: (𝑚 + 1)𝑛
slide-12
SLIDE 12

IBM Model 1: generative story

  • Input

– an English sentence of length l – a length m

  • For each French position 𝑗 in 1..m

– Pick an English source index j – Choose a translation

slide-13
SLIDE 13

IBM Model 1: generative story

  • Input

– an English sentence of length l – a length m

  • For each French position 𝑗 in 1..m

– Pick an English source index j – Choose a translation Alignment is based on word positions, not word identities Alignment probabilities are UNIFORM Words are translated independently

slide-14
SLIDE 14

IBM Model 1: Parameters

  • t(f|e)

– Word translation probability table – for all words in French & English vocab

slide-15
SLIDE 15

IBM Model 1: generative story

  • Input

– an English sentence of length l – a length m

  • For each French position 𝑗 in 1..m

– Pick an English source index j – Choose a translation

slide-16
SLIDE 16

Improving on IBM Model 1: IBM Model 2

  • Input

– an English sentence of length l – a length m

  • For each French position 𝑗 in 1..m

– Pick an English source index j – Choose a translation Remove assumption that q is uniform

slide-17
SLIDE 17

IBM Model 2: Parameters

  • q(j|i,l,m)

– now a table – not uniform as in IBM1

  • How many

parameters are there?

slide-18
SLIDE 18

2 Remaining T asks

Inference

  • Given

– a sentence pair (e,f) – an alignment model with parameters t(e|f) and q(j|i,l,m)

  • What is the most

probable alignment a? Parameter Estimation

  • Given

– training data (lots of sentence pairs) – a model definition

  • how do we learn the

parameters t(e|f) and q(j|i,l,m)?

slide-19
SLIDE 19

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-20
SLIDE 20

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-21
SLIDE 21

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-22
SLIDE 22

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-23
SLIDE 23

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-24
SLIDE 24

Inference

  • Inputs

– Model parameter tables for t and q – A sentence pair

  • How do we find the alignment a that maximizes

P(e,a|f)?

– Hint: recall independence assumptions!

slide-25
SLIDE 25

1 Remaining T ask

Inference

  • Given a sentence pair

(e,f), what is the most probable alignment a? Parameter Estimation

  • How do we learn the

parameters t(e|f) and q(j|i,l,m) from data?

slide-26
SLIDE 26

Parameter Estimation

  • Problem

– Parallel corpus gives us (e,f) pairs only, a is hidden

  • We know how to

– estimate t and q, given (e,a,f) – compute p(e,a|f), given t and q

  • Solution: Expectation-Maximization algorithm (EM)

– E-step: given hidden variable, estimate parameters – M-step: given parameters, update hidden variable

slide-27
SLIDE 27

Parameter Estimation: EM

Use “Soft” values instead of binary counts

slide-28
SLIDE 28

Parameter Estimation: soft EM

  • Soft EM considers all possible alignment links
  • Each alignment link now has a weight
slide-29
SLIDE 29

EM for IBM Model 1

  • Expectation (E)-step:

– Compute expected counts for parameters (t) based on summing over hidden variable

  • Maximization (M)-step:

– Compute the maximum likelihood estimate of t from the expected counts

slide-30
SLIDE 30

green house the house casa verde la casa

EM example: initialization

For the rest of this talk, French = Spanish

slide-31
SLIDE 31

EM example: E-step

(a) compute probability of each alignment p(a|f,e)

Note: we’re making simplification assumptions in this example

  • No NULL word
  • We only consider alignments were each

French and English word is aligned to something

  • We ignore q!
slide-32
SLIDE 32

EM example: E-step

(b) normalize to get p(a|f,e)

slide-33
SLIDE 33

EM example: E-step

(c) compute expected counts (weighting each count by p(a|e,f)

slide-34
SLIDE 34

EM example: M-step

Compute probability estimate by normalizing expected counts

slide-35
SLIDE 35

EM example: next iteration

slide-36
SLIDE 36

Parameter Estimation with EM

  • EM guarantees that data likelihood does not

decrease across iterations

  • EM can get stuck in a local optimum

– Initialization matters

slide-37
SLIDE 37

Word Alignment with IBM Models 1, 2

  • Probabilistic models with strong independence

assumptions

– Results in linguistically naïve models

  • asymmetric, 1-to-many alignments

– But allows efficient parameter estimation and inference

  • Alignments are hidden variables

– unlike words which are observed – require unsupervised learning (EM algorithm)

slide-38
SLIDE 38

PH PHRAS ASE-BASED BASED MO MODE DELS

slide-39
SLIDE 39

Phrase-based models

  • Most common way to model P(F|E) nowadays

(instead of IBM models)

Start position of f_i End position of f_(i-1) Probability of two consecutive English phrases being separated by a particular span in French

slide-40
SLIDE 40

Phrase alignments are derived from word alignments

Get high confidence alignment links by intersecting IBM word alignments from both directions

This means that the IBM model represents P(Spanish|English)

slide-41
SLIDE 41

Phrase alignments are derived from word alignments

Improve recall by adding some links from the union of alignments

slide-42
SLIDE 42

Phrase alignments are derived from word alignments

Extract phrases that are consistent with word alignment

slide-43
SLIDE 43

Phrase Translation Probabilities

  • Given such phrases we can get the

required statistics for the model from

slide-44
SLIDE 44

Phrase-based Machine Translation

slide-45
SLIDE 45

RECAP AP

slide-46
SLIDE 46

Noisy Channel Model for Machine Translation

  • The noisy channel model decomposes machine

translation into two independent subproblems – Language modeling – Translation modeling / Alignment

slide-47
SLIDE 47

Word Alignment with IBM Models 1, 2

  • Probabilistic models with strong independence

assumptions

  • Alignments are hidden variables

– unlike words which are observed – require unsupervised learning (EM algorithm)

  • Word alignments often used as building blocks

for more complex translation models

– E.g., phrase-based machine translation