Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - PowerPoint PPT Presentation

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok sprok . 7a. lalok farok ororok lalok sprok izok enemok . 1b. at-voon bichat dat . 7b. wat jjat bichat wat dat vat eneat . 2a. ok-drubel ok-voon anok plok 8a. lalok brok anok plok nok . sprok . 8b. iat lat pippat rrat nnat . 2b. at-drubel at-voon pippat rrat dat . 3a. erok sprok izok hihok ghirok . 9a. wiwok nok izok kantok ok-yurp . 3b. totat dat arrat vat hilat . 9b. totat nnat quat oloat at-yurp . 4a. ok-voon anok drok brok jok . 10a. lalok mok nok yorok ghirok clok . 4b. at-voon krat pippat sat lat . 10b. wat nnat gat mat bat hilat . 5a. wiwok farok izok stok . 11a. lalok nok crrrok hihok yorok zanzanok . 5b. totat jjat quat cat . 11b. wat nnat arrat mat zanzanat . 6a. lalok sprok izok jok stok . 12a. lalok rarok nok izok hihok mok . 6b. wat dat krat quat cat . 12b. wat nnat forat arrat vat gat .

Centauri/Arcturian was actually Spanish/English… Translate: Clients do not sell pharmaceuticals in Europe. 1a. Garcia and associates . 7a. the clients and the associates are enemies . 1b. Garcia y asociados . 7b. los clients y los asociados son enemigos . 2a. Carlos Garcia has three associates . 8a. the company has three groups . 2b. Carlos Garcia tiene tres asociados . 8b. la empresa tiene tres grupos . 3a. his associates are not strong . 9a. its groups are in Europe . 3b. sus asociados no son fuertes . 9b. sus grupos estan en Europa . 4a. Garcia has a company also . 10a. the modern groups sell strong pharmaceuticals 4b. Garcia tambien tiene una empresa . 10b. los grupos modernos venden medicinas fuertes 5a. its clients are angry . 11a. the groups do not sell zenzanine . 5b. sus clientes estan enfadados . 11b. los grupos no venden zanzanina . 6a. the associates are also angry . 12a. the small groups are not modern . 6b. los asociados tambien estan 12b. los grupos pequenos no son modernos . enfadados .

1947 When I look at an article in Russian, I say to myself: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. Warren Weaver

1988 More about the IBM story: 20 years of bitext workshop

Noisy Channel Model for Machine Translation • The noisy channel model decomposes machine translation into two independent subproblems – Language modeling – Translation modeling / Alignment

WO WORD AL D ALIGN GNME MENT NT

How can we model p(f|e)? • We’ll describe the word alignment models introduced in early 90s at IBM • Assumption: each French word f is aligned to exactly one English word e – Including NULL

Word Alignment Vector Representation • Alignment vector a = [2,3,4,5,6,6,6] – length of a = length of sentence f – a i = j if French position i is aligned to English position j

Formalizing the connection between word alignments & the translation model • We define a conditional model – Projecting word translations – Through alignment links

How many possible alignments in A? • How many possible alignments for (f,e) where – f is French sentence with m words – e is an English sentence with l words • For each of m French words, we choose an alignment link among (l+1) English words • Answer: (𝑚 + 1) 𝑛

IBM Model 1: generative story • Input – an English sentence of length l – a length m • For each French position 𝑗 in 1..m – Pick an English source index j – Choose a translation

IBM Model 1: generative story • Input – an English sentence of length l Alignment is based on word positions, not Alignment probabilities – a length m word identities are UNIFORM • For each French position 𝑗 in 1..m – Pick an English source index j – Choose a translation Words are translated independently

IBM Model 1: Parameters • t(f|e) – Word translation probability table – for all words in French & English vocab

IBM Model 1: generative story • Input – an English sentence of length l – a length m • For each French position 𝑗 in 1..m – Pick an English source index j – Choose a translation

Improving on IBM Model 1: IBM Model 2 • Input – an English sentence of length l Remove – a length m assumption that q is uniform • For each French position 𝑗 in 1..m – Pick an English source index j – Choose a translation

IBM Model 2: Parameters • q(j|i,l,m) – now a table – not uniform as in IBM1 • How many parameters are there?

2 Remaining T asks Inference Parameter Estimation • Given • Given – a sentence pair (e,f) – training data (lots of sentence pairs) – an alignment model with parameters t(e|f) – a model definition and q(j|i,l,m) • how do we learn the • What is the most parameters t(e|f) and probable alignment a? q(j|i,l,m)?

Inference • Inputs – Model parameter tables for t and q – A sentence pair • How do we find the alignment a that maximizes P(e,a|f)? – Hint: recall independence assumptions!

1 Remaining T ask Inference Parameter Estimation • Given a sentence pair • How do we learn the (e,f), what is the most parameters t(e|f) and probable alignment a? q(j|i,l,m) from data?

Parameter Estimation • Problem – Parallel corpus gives us (e,f) pairs only, a is hidden • We know how to – estimate t and q , given (e,a,f) – compute p(e,a|f) , given t and q • Solution: Expectation-Maximization algorithm (EM) – E-step: given hidden variable, estimate parameters – M-step: given parameters, update hidden variable

Parameter Estimation: EM Use “Soft” values instead of binary counts

Parameter Estimation: soft EM • Soft EM considers all possible alignment links • Each alignment link now has a weight

EM for IBM Model 1 • Expectation (E)-step: – Compute expected counts for parameters (t) based on summing over hidden variable • Maximization (M)-step: – Compute the maximum likelihood estimate of t from the expected counts

EM example: initialization green house the house casa verde la casa For the rest of this talk, French = Spanish

EM example: E-step (a) compute probability of each alignment p(a|f,e) Note: we’re making simplification assumptions in this example • No NULL word • We only consider alignments were each French and English word is aligned to something • We ignore q!

EM example: E-step (b) normalize to get p(a|f,e)

EM example: E-step (c) compute expected counts (weighting each count by p(a|e,f)

EM example: M-step Compute probability estimate by normalizing expected counts

EM example: next iteration

Parameter Estimation with EM • EM guarantees that data likelihood does not decrease across iterations • EM can get stuck in a local optimum – Initialization matters

Word Alignment with IBM Models 1, 2 • Probabilistic models with strong independence assumptions – Results in linguistically naïve models • asymmetric, 1-to-many alignments – But allows efficient parameter estimation and inference • Alignments are hidden variables – unlike words which are observed – require unsupervised learning (EM algorithm)

PH PHRAS ASE-BASED BASED MO MODE DELS

Phrase-based models • Most common way to model P(F|E) nowadays (instead of IBM models) Start position of f_i End position of f_(i-1) Probability of two consecutive English phrases being separated by a particular span in French

Phrase alignments are derived from word alignments This means that the IBM model represents P(Spanish|English) Get high confidence alignment links by intersecting IBM word alignments from both directions

Phrase alignments are derived from word alignments Improve recall by adding some links from the union of alignments

Phrase alignments are derived from word alignments Extract phrases that are consistent with word alignment

Phrase Translation Probabilities • Given such phrases we can get the required statistics for the model from

Phrase-based Machine Translation

RECAP AP

Noisy Channel Model for Machine Translation • The noisy channel model decomposes machine translation into two independent subproblems – Language modeling – Translation modeling / Alignment

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M - PowerPoint PPT Presentation

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Centauri/Arcturan [Knight, 1997] Your assignment, translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Alignment with beam halo MC Andrea Parenti 05/05/2009 Outline: Alignment with Beam Halo (BH)

Alignment in C Seminar Effiziente Programmierung in C Sven-Hendrik Haase Universit at

Educational Alignment Study 2 5 Ju n e 2 0 1 8 Educational Alignment Study Jefferson Primary

Algorithm X. Li, C. Bao, F. Baker March 2009 Abstract This document specifies an update to

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs (Chapter 18) TLB measurement demo

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar

Today Memory Management Segmentation, Paging Improving memory performance MMU

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

The HIT-LTRC Machine Translation System for IWSLT 2012 Xiaoning Zhu, Yiming Cui, Conghui Zhu,

Leveling for Non-Volatile Main Memories Haikun Liu , Yuanyuan Ye, Xiaofei Liao, Hai Jin, Yu Zhang,