Machine Translation Dan Klein, John DeNero UC Berkeley Translation - - PowerPoint PPT Presentation
Machine Translation Dan Klein, John DeNero UC Berkeley Translation - - PowerPoint PPT Presentation
Machine Translation Dan Klein, John DeNero UC Berkeley Translation Task Text as input & text as output. Input & output have roughly the same information content. Output is more predictable than a language modeling task.
Translation Task
- Text as input & text as output.
- Input & output have roughly the same information content.
- Output is more predictable than a language modeling task.
- Lots of naturally occurring examples (but not much metadata).
Translation Examples
English-German News Test 2013 (a standard dev set)
Republican leaders justified their policy by the need to combat electoral fraud. Die Führungskräfte der Republikaner | | | | The Executives of the republican rechtfertigen ihre Politik mit der | | | | | justify your politics With of the Notwendigkeit , den Wahlbetrug zu | | | | | need , the election fraud to bekämpfen . | | fight .
Variety in Human-Generated Translations
An asteroid large enough to destroy a mid-size city brushed the Earth within a short distance of 463,000 km without being detected in advance. Astronomers did not know the event until four days later. About 50 meters in diameter, the asteroid came from the direction of the sun, making it very difficult for astronomers to discover it. An asteroid, large enough to flatten an average city, brushed past the Earth within a short range of 463,000 kilometers, but was not discovered in time. It was four days after the close shave could astronomers tell about it. This asteroid, about 50 meters in diameter, was flying from the direction of the sun, thus astronomers could hardly detect it. An asteroid big enough to ruin a mid-sized city passed by in a close range of 463,000 kilometres off Earth without being noticed in advance. Astronomers learned of the event four days
- later. The asteroid, about 50 metres in diameter, came in the
direction of Sun, which made it hard for astronomers to discover.
From https://catalog.ldc.upenn.edu/LDC2003T17
Variety in Machine Translations
A small planet, whose is as big as could destroy a middle sized city, passed by the earth with a distance of 463 thousand
- kilometers. This was not found in advance. The astronomists got
to know this incident 4 days later. This small planet is 50m in
- diameter. The astonomists are hard to find it for it comes from
the direction of sun. A volume enough to destroy a medium city small planet is big, flit earth within 463,000 kilometres of close however were not in advance discovered, astronomer just knew this matter after four days. This small planet diameter is about 50 metre, from the direction at sun, therefore astronomer very hard to discovers it. An asteroid that was large enough to destroy a medium-sized city, swept across the earth at a short distance of 463,000 kilometers, but was not detected early. Astronomers learned about it four days later. The asteroid is about 50 meters in diameter and comes from the direction of the sun, making it difficult for astronomers to spot it.
From https://catalog.ldc.upenn.edu/LDC2003T17
Google Translate, 2020 A commercial system from 2002 Human-generated reference translation
Evaluation
BLEU Score
BLEU score: geometric mean of 1-, 2-, 3-, and 4-gram precision
- vs. a reference, multiplied by brevity penalty (harshly
penalizes translations shorter than the reference).
Matchedi = X
ti
min ⇢ Ch(ti), max
j
Cj(ti)
- Pi = Matchedi
Hi B = exp ⇢ min ✓ 0, n − L n ◆ BLUE = B 4 Y
i=1
Pi ! 1
4
<latexit sha1_base64="yMt78n1pfSr4xNhNmREG2KYwebU=">AC8XicbZLPa9swFMdl70e7EfT7bjLY2Ejha7YI7BdCiVl0EMHGSxtIUqNIsuJWks2kjwShP6LXbYGLvuv9lt/81kx4eu2QPBV9/39Hnyk2dlzrWJoj9BeOfuvftb2w86Dx89frLT3X16potKUTamRV6oixnRLOeSjQ03ObsoFSNilrPz2fVxnT/zJTmhfxkViWbCjKXPOUG8lu8EWNmxpRGo/EMXLHUJh1eHgHUlEmsS7sBiwaXDOcsMtsfJou/dvX0syDK58vurZu+w4vOFwQ4w7oxaRqYItZt8Z09qri8cNmVsWUKLh7pXo/vRfguQ8BpOnZVtiz242aqha2qHp+P3rqYN16g+4FIVaWI5HELsLgdQX6olXNo1OXZ24GoMdDpJtxcdRE3Apohb0UNtjJLub5wWtBJMGpoTrSdxVJqpJcpwmjPXwZVmJaHXZM4mXkoimJ7a5sUcvPROClmh/JIGvfmCUuE1isx85WCmIW+navN/+UmlcneTS2XZWYpOtGWZWDKaB+fki5YtTkKy8IVdzfFeiC+GEY/5PUQ4hvf/KmOHtzEcH8cdB72jYjmMbPUcvUB/F6C06QidohMaIBjL4EnwLvoc6/Br+CH+uS8OgPfM/RPhr7+58+lt</latexit><latexit sha1_base64="yMt78n1pfSr4xNhNmREG2KYwebU=">AC8XicbZLPa9swFMdl70e7EfT7bjLY2Ejha7YI7BdCiVl0EMHGSxtIUqNIsuJWks2kjwShP6LXbYGLvuv9lt/81kx4eu2QPBV9/39Hnyk2dlzrWJoj9BeOfuvftb2w86Dx89frLT3X16potKUTamRV6oixnRLOeSjQ03ObsoFSNilrPz2fVxnT/zJTmhfxkViWbCjKXPOUG8lu8EWNmxpRGo/EMXLHUJh1eHgHUlEmsS7sBiwaXDOcsMtsfJou/dvX0syDK58vurZu+w4vOFwQ4w7oxaRqYItZt8Z09qri8cNmVsWUKLh7pXo/vRfguQ8BpOnZVtiz242aqha2qHp+P3rqYN16g+4FIVaWI5HELsLgdQX6olXNo1OXZ24GoMdDpJtxcdRE3Apohb0UNtjJLub5wWtBJMGpoTrSdxVJqpJcpwmjPXwZVmJaHXZM4mXkoimJ7a5sUcvPROClmh/JIGvfmCUuE1isx85WCmIW+navN/+UmlcneTS2XZWYpOtGWZWDKaB+fki5YtTkKy8IVdzfFeiC+GEY/5PUQ4hvf/KmOHtzEcH8cdB72jYjmMbPUcvUB/F6C06QidohMaIBjL4EnwLvoc6/Br+CH+uS8OgPfM/RPhr7+58+lt</latexit><latexit sha1_base64="yMt78n1pfSr4xNhNmREG2KYwebU=">AC8XicbZLPa9swFMdl70e7EfT7bjLY2Ejha7YI7BdCiVl0EMHGSxtIUqNIsuJWks2kjwShP6LXbYGLvuv9lt/81kx4eu2QPBV9/39Hnyk2dlzrWJoj9BeOfuvftb2w86Dx89frLT3X16potKUTamRV6oixnRLOeSjQ03ObsoFSNilrPz2fVxnT/zJTmhfxkViWbCjKXPOUG8lu8EWNmxpRGo/EMXLHUJh1eHgHUlEmsS7sBiwaXDOcsMtsfJou/dvX0syDK58vurZu+w4vOFwQ4w7oxaRqYItZt8Z09qri8cNmVsWUKLh7pXo/vRfguQ8BpOnZVtiz242aqha2qHp+P3rqYN16g+4FIVaWI5HELsLgdQX6olXNo1OXZ24GoMdDpJtxcdRE3Apohb0UNtjJLub5wWtBJMGpoTrSdxVJqpJcpwmjPXwZVmJaHXZM4mXkoimJ7a5sUcvPROClmh/JIGvfmCUuE1isx85WCmIW+navN/+UmlcneTS2XZWYpOtGWZWDKaB+fki5YtTkKy8IVdzfFeiC+GEY/5PUQ4hvf/KmOHtzEcH8cdB72jYjmMbPUcvUB/F6C06QidohMaIBjL4EnwLvoc6/Br+CH+uS8OgPfM/RPhr7+58+lt</latexit><latexit sha1_base64="yMt78n1pfSr4xNhNmREG2KYwebU=">AC8XicbZLPa9swFMdl70e7EfT7bjLY2Ejha7YI7BdCiVl0EMHGSxtIUqNIsuJWks2kjwShP6LXbYGLvuv9lt/81kx4eu2QPBV9/39Hnyk2dlzrWJoj9BeOfuvftb2w86Dx89frLT3X16potKUTamRV6oixnRLOeSjQ03ObsoFSNilrPz2fVxnT/zJTmhfxkViWbCjKXPOUG8lu8EWNmxpRGo/EMXLHUJh1eHgHUlEmsS7sBiwaXDOcsMtsfJou/dvX0syDK58vurZu+w4vOFwQ4w7oxaRqYItZt8Z09qri8cNmVsWUKLh7pXo/vRfguQ8BpOnZVtiz242aqha2qHp+P3rqYN16g+4FIVaWI5HELsLgdQX6olXNo1OXZ24GoMdDpJtxcdRE3Apohb0UNtjJLub5wWtBJMGpoTrSdxVJqpJcpwmjPXwZVmJaHXZM4mXkoimJ7a5sUcvPROClmh/JIGvfmCUuE1isx85WCmIW+navN/+UmlcneTS2XZWYpOtGWZWDKaB+fki5YtTkKy8IVdzfFeiC+GEY/5PUQ4hvf/KmOHtzEcH8cdB72jYjmMbPUcvUB/F6C06QidohMaIBjL4EnwLvoc6/Br+CH+uS8OgPfM/RPhr7+58+lt</latexit>If "of the" appears twice in hypothesis h but only at most
- nce in a reference,
then only the first is "correct" "Clipped" precision of n-gram tokens Brevity penalty only matters if the hypothesis corpus is shorter than the shortest reference. BLEU is a mean of clipped precisions, scaled down by the brevity penalty.
Evaluation with BLEU
In this sense, the measures will partially undermine the American democratic system. In this sense, these measures partially undermine the democratic system of the United States.
...
BLEU = 26.52, 75.0/40.0/21.4/7.7 (BP=1.000, ratio=1.143, hyp_len=16, ref_len=14)
(Papineni et al., 2002) BLEU: a method for automatic evaluation of machine translation.
Corpus BLEU Correlations with Average Human Judgments
Figure from G. Doddington (NIST)
These are ecological correlations over multiple segments; segment-level BLEU scores are noisy. Commercial machine translation providers seem to all perform human evaluations of some sort. (Ma et al., 2019) Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges
Human Evaluations
Direct assessment: adequacy & fluency
- Monolingual: Ask humans to compare machine translation to a
human-generated reference. (Easier to source annotators)
- Bilingual: Ask humans to compare machine translation to the
source sentence that was translated. (Compares to human quality)
- Annotators can assess segments (sentences) or whole documents.
- Segments can be assessed with or without document context.
Ranking assessment:
- Raters are presented with 2 or more translations.
- A human-generated reference may be provided, along with the
source.
- "In a pairwise ranking experiment, human raters assessing
adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences." (Laubli et al., 2018) Editing assessment: How many edits required to reach human quality
(Laubli et al., 2018) Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
Translationese and Evaluation
Translated text can: (Baker et al., 1993; Graham et al., 2019)
- be more explicit than the original source
- be less ambiguous
- be simplified (lexical, syntactically and stylistically)
- display a preference for conventional grammaticality
- avoid repetition
- exaggerate target language features
- display features of the source language
"If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved." (Toral et al., 2018)
(Baker et al., 1993) Corpus linguistics and transla- tion studies: Implications and applications. (Graham et al., 2019) Translationese in Machine Translation Evaluation. (Toral et al, 2018) Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
WMT 2019 Evaluation
2019 segment-in-context direct assessment (Barrault et al, 2019):
(Barrault et al, 2019) Findings of the 2019 Conference on Machine Translation (WMT19)
Statistical Machine Translation (1990 - 2015)
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” Warren Weaver (1949)
Levels of Transfer: Vauquois Triangle (1968)
I will do it later Target language
Data-Driven Machine Translation
Parallel corpus gives translation examples Yo lo haré de muy buen grado I will do it gladly Después lo veras You will see later Machine translation system: Model of translation Target language corpus gives examples of well-formed sentences I will get to it later See you later He will do it Yo lo haré después
N
O V E L
S
E N T E N C E
Source language
VB MD VP VP NP S PRP ADV
Stitching Together Fragments
Yo lo haré de muy buen grado I will do it gladly Después lo veras You will see later
PRP VB MD VP VP NP S PRP ADV
I will do it later Model of translation Yo lo haré después Machine translation system:
S S ADV ADV
Parallel corpus gives translation examples
Evolution of the Noisy Channel Model
P(e|f) ∝ P(f|e) · P(e)
<latexit sha1_base64="NKJVZrqzXGAyDutyCPAXr0rsek=">ACXicbVDLSsNAFJ3UV62vqEs3g0VoNyURQZdFNy4r2Ae0oUwmN+3QSbMTISduvGX3HjQhG3/oE7/8Zpm4W2Hhg495x7uXOPn3CmtON8W4W19Y3NreJ2aWd3b/APjxqKZFKCk0quJAdnyjgLIamZpDJ5FAIp9D2x/dzPz2A0jFRHyvxwl4ERnELGSUaCP1bdyowCSs4l4iRaKFKcMJmJIGQs+8at8uOzVnDrxK3JyUY5G3/7qBYKmEcSacqJU13US7WVEakY5TEu9VEFC6IgMoGtoTCJQXja/ZIrPjBLgUEjzYo3n6u+JjERKjSPfdEZED9WyNxP/87qpDq+8jMVJqiGmi0VhyrE5eRYLDpgEqvnYEIlM3/FdEgkodqEVzIhuMsnr5LWec1au7dRbl+ncdRCfoFWQiy5RHd2iBmoih7RM3pFb9aT9WK9Wx+L1oKVzxyjP7A+fwB/XphH</latexit><latexit sha1_base64="NKJVZrqzXGAyDutyCPAXr0rsek=">ACXicbVDLSsNAFJ3UV62vqEs3g0VoNyURQZdFNy4r2Ae0oUwmN+3QSbMTISduvGX3HjQhG3/oE7/8Zpm4W2Hhg495x7uXOPn3CmtON8W4W19Y3NreJ2aWd3b/APjxqKZFKCk0quJAdnyjgLIamZpDJ5FAIp9D2x/dzPz2A0jFRHyvxwl4ERnELGSUaCP1bdyowCSs4l4iRaKFKcMJmJIGQs+8at8uOzVnDrxK3JyUY5G3/7qBYKmEcSacqJU13US7WVEakY5TEu9VEFC6IgMoGtoTCJQXja/ZIrPjBLgUEjzYo3n6u+JjERKjSPfdEZED9WyNxP/87qpDq+8jMVJqiGmi0VhyrE5eRYLDpgEqvnYEIlM3/FdEgkodqEVzIhuMsnr5LWec1au7dRbl+ncdRCfoFWQiy5RHd2iBmoih7RM3pFb9aT9WK9Wx+L1oKVzxyjP7A+fwB/XphH</latexit><latexit sha1_base64="NKJVZrqzXGAyDutyCPAXr0rsek=">ACXicbVDLSsNAFJ3UV62vqEs3g0VoNyURQZdFNy4r2Ae0oUwmN+3QSbMTISduvGX3HjQhG3/oE7/8Zpm4W2Hhg495x7uXOPn3CmtON8W4W19Y3NreJ2aWd3b/APjxqKZFKCk0quJAdnyjgLIamZpDJ5FAIp9D2x/dzPz2A0jFRHyvxwl4ERnELGSUaCP1bdyowCSs4l4iRaKFKcMJmJIGQs+8at8uOzVnDrxK3JyUY5G3/7qBYKmEcSacqJU13US7WVEakY5TEu9VEFC6IgMoGtoTCJQXja/ZIrPjBLgUEjzYo3n6u+JjERKjSPfdEZED9WyNxP/87qpDq+8jMVJqiGmi0VhyrE5eRYLDpgEqvnYEIlM3/FdEgkodqEVzIhuMsnr5LWec1au7dRbl+ncdRCfoFWQiy5RHd2iBmoih7RM3pFb9aT9WK9Wx+L1oKVzxyjP7A+fwB/XphH</latexit><latexit sha1_base64="NKJVZrqzXGAyDutyCPAXr0rsek=">ACXicbVDLSsNAFJ3UV62vqEs3g0VoNyURQZdFNy4r2Ae0oUwmN+3QSbMTISduvGX3HjQhG3/oE7/8Zpm4W2Hhg495x7uXOPn3CmtON8W4W19Y3NreJ2aWd3b/APjxqKZFKCk0quJAdnyjgLIamZpDJ5FAIp9D2x/dzPz2A0jFRHyvxwl4ERnELGSUaCP1bdyowCSs4l4iRaKFKcMJmJIGQs+8at8uOzVnDrxK3JyUY5G3/7qBYKmEcSacqJU13US7WVEakY5TEu9VEFC6IgMoGtoTCJQXja/ZIrPjBLgUEjzYo3n6u+JjERKjSPfdEZED9WyNxP/87qpDq+8jMVJqiGmi0VhyrE5eRYLDpgEqvnYEIlM3/FdEgkodqEVzIhuMsnr5LWec1au7dRbl+ncdRCfoFWQiy5RHd2iBmoih7RM3pFb9aT9WK9Wx+L1oKVzxyjP7A+fwB/XphH</latexit>max
e
P(e|f) = max
e
P(f|e) · P(e)
<latexit sha1_base64="7nyhr7I9Wstz6pGkuMPZ8VoEpyU=">ACEXicbVDLSgMxFM34rPVdekmWIR2U2ZE0I1QdOygn1AW4ZM5k4bmswMSUYsbX/Bjb/ixoUibt2582/MtINo64ELJ+fcS+49XsyZ0rb9ZS0tr6yurec28ptb2zu7hb39hoSaFOIx7JlkcUcBZCXTPNoRVLIMLj0PQGV6nfvAOpWBTe6mEMXUF6IQsYJdpIbqHUEeTeBVwrwTgo4wv8w7GUMYd6kc6NctuoWhX7CnwInEyUkQZam7hs+NHNBEQasqJUm3HjnV3RKRmlMk30kUxIQOSA/ahoZEgOqOphdN8LFRfBxE0lSo8VT9PTEiQqmh8EynILqv5r1U/M9rJzo4745YGCcaQjr7KEg41hFO48E+k0A1HxpCqGRmV0z7RBKqTYh5E4Izf/IiaZxUHLvi3JwWq5dZHDl0iI5QCTnoDFXRNaqhOqLoAT2hF/RqPVrP1pv1PmtdsrKZA/QH1sc3eYWa4A=</latexit><latexit sha1_base64="7nyhr7I9Wstz6pGkuMPZ8VoEpyU=">ACEXicbVDLSgMxFM34rPVdekmWIR2U2ZE0I1QdOygn1AW4ZM5k4bmswMSUYsbX/Bjb/ixoUibt2582/MtINo64ELJ+fcS+49XsyZ0rb9ZS0tr6yurec28ptb2zu7hb39hoSaFOIx7JlkcUcBZCXTPNoRVLIMLj0PQGV6nfvAOpWBTe6mEMXUF6IQsYJdpIbqHUEeTeBVwrwTgo4wv8w7GUMYd6kc6NctuoWhX7CnwInEyUkQZam7hs+NHNBEQasqJUm3HjnV3RKRmlMk30kUxIQOSA/ahoZEgOqOphdN8LFRfBxE0lSo8VT9PTEiQqmh8EynILqv5r1U/M9rJzo4745YGCcaQjr7KEg41hFO48E+k0A1HxpCqGRmV0z7RBKqTYh5E4Izf/IiaZxUHLvi3JwWq5dZHDl0iI5QCTnoDFXRNaqhOqLoAT2hF/RqPVrP1pv1PmtdsrKZA/QH1sc3eYWa4A=</latexit><latexit sha1_base64="7nyhr7I9Wstz6pGkuMPZ8VoEpyU=">ACEXicbVDLSgMxFM34rPVdekmWIR2U2ZE0I1QdOygn1AW4ZM5k4bmswMSUYsbX/Bjb/ixoUibt2582/MtINo64ELJ+fcS+49XsyZ0rb9ZS0tr6yurec28ptb2zu7hb39hoSaFOIx7JlkcUcBZCXTPNoRVLIMLj0PQGV6nfvAOpWBTe6mEMXUF6IQsYJdpIbqHUEeTeBVwrwTgo4wv8w7GUMYd6kc6NctuoWhX7CnwInEyUkQZam7hs+NHNBEQasqJUm3HjnV3RKRmlMk30kUxIQOSA/ahoZEgOqOphdN8LFRfBxE0lSo8VT9PTEiQqmh8EynILqv5r1U/M9rJzo4745YGCcaQjr7KEg41hFO48E+k0A1HxpCqGRmV0z7RBKqTYh5E4Izf/IiaZxUHLvi3JwWq5dZHDl0iI5QCTnoDFXRNaqhOqLoAT2hF/RqPVrP1pv1PmtdsrKZA/QH1sc3eYWa4A=</latexit><latexit sha1_base64="7nyhr7I9Wstz6pGkuMPZ8VoEpyU=">ACEXicbVDLSgMxFM34rPVdekmWIR2U2ZE0I1QdOygn1AW4ZM5k4bmswMSUYsbX/Bjb/ixoUibt2582/MtINo64ELJ+fcS+49XsyZ0rb9ZS0tr6yurec28ptb2zu7hb39hoSaFOIx7JlkcUcBZCXTPNoRVLIMLj0PQGV6nfvAOpWBTe6mEMXUF6IQsYJdpIbqHUEeTeBVwrwTgo4wv8w7GUMYd6kc6NctuoWhX7CnwInEyUkQZam7hs+NHNBEQasqJUm3HjnV3RKRmlMk30kUxIQOSA/ahoZEgOqOphdN8LFRfBxE0lSo8VT9PTEiQqmh8EynILqv5r1U/M9rJzo4745YGCcaQjr7KEg41hFO48E+k0A1HxpCqGRmV0z7RBKqTYh5E4Izf/IiaZxUHLvi3JwWq5dZHDl0iI5QCTnoDFXRNaqhOqLoAT2hF/RqPVrP1pv1PmtdsrKZA/QH1sc3eYWa4A=</latexit>P(e|f) ∝ exp (X
i
wi · fi(e, f) )
<latexit sha1_base64="esCA4OxvpRn0V6rJjb09OU+5c=">ACKnicbVBNaxsxFNQ6Seu4aeOkx1xETMCFYnZLIT26ySVHF2onYJlFK7+1hbUrIb1tazb+Pbnkr+SQ0LotT+k8tqHfA08GbmIb1JjJIOw/AhqG1sbr15W9uvNt5/2G3ubc/cLqwAvpCK20vEu5AyRz6KFHBhbHAs0TBeTI7Xfrnv8A6qfOfODcwyvgkl6kUHL0UN7/32kAvafqJMmO1QU0Z/DGUKUiRlZS5Iosl/e2HibFGmsayDZ+rvJWTKbJF3GyFnbACfUmiNWmRNXpx85aNtSgyFEo7twCg2OSm5RCgWLBiscGC5mfAJDT3OegRuV1akLeuSVMU219ZMjrdTHGyXPnJtniU9mHKfubcUX/OGBabfRqXMTYGQi9VDaGob2TZGx1LCwLV3BMurPR/pWLKLRfo234EqLnJ78kgy+dKOxEP762uifrOurkgBySNonIMemSM9IjfSLIFbkhd+Q+uA5ug4fg7ypaC9Y7H8kTBP/+A5SlpYs=</latexit><latexit sha1_base64="esCA4OxvpRn0V6rJjb09OU+5c=">ACKnicbVBNaxsxFNQ6Seu4aeOkx1xETMCFYnZLIT26ySVHF2onYJlFK7+1hbUrIb1tazb+Pbnkr+SQ0LotT+k8tqHfA08GbmIb1JjJIOw/AhqG1sbr15W9uvNt5/2G3ubc/cLqwAvpCK20vEu5AyRz6KFHBhbHAs0TBeTI7Xfrnv8A6qfOfODcwyvgkl6kUHL0UN7/32kAvafqJMmO1QU0Z/DGUKUiRlZS5Iosl/e2HibFGmsayDZ+rvJWTKbJF3GyFnbACfUmiNWmRNXpx85aNtSgyFEo7twCg2OSm5RCgWLBiscGC5mfAJDT3OegRuV1akLeuSVMU219ZMjrdTHGyXPnJtniU9mHKfubcUX/OGBabfRqXMTYGQi9VDaGob2TZGx1LCwLV3BMurPR/pWLKLRfo234EqLnJ78kgy+dKOxEP762uifrOurkgBySNonIMemSM9IjfSLIFbkhd+Q+uA5ug4fg7ypaC9Y7H8kTBP/+A5SlpYs=</latexit><latexit sha1_base64="esCA4OxvpRn0V6rJjb09OU+5c=">ACKnicbVBNaxsxFNQ6Seu4aeOkx1xETMCFYnZLIT26ySVHF2onYJlFK7+1hbUrIb1tazb+Pbnkr+SQ0LotT+k8tqHfA08GbmIb1JjJIOw/AhqG1sbr15W9uvNt5/2G3ubc/cLqwAvpCK20vEu5AyRz6KFHBhbHAs0TBeTI7Xfrnv8A6qfOfODcwyvgkl6kUHL0UN7/32kAvafqJMmO1QU0Z/DGUKUiRlZS5Iosl/e2HibFGmsayDZ+rvJWTKbJF3GyFnbACfUmiNWmRNXpx85aNtSgyFEo7twCg2OSm5RCgWLBiscGC5mfAJDT3OegRuV1akLeuSVMU219ZMjrdTHGyXPnJtniU9mHKfubcUX/OGBabfRqXMTYGQi9VDaGob2TZGx1LCwLV3BMurPR/pWLKLRfo234EqLnJ78kgy+dKOxEP762uifrOurkgBySNonIMemSM9IjfSLIFbkhd+Q+uA5ug4fg7ypaC9Y7H8kTBP/+A5SlpYs=</latexit><latexit sha1_base64="esCA4OxvpRn0V6rJjb09OU+5c=">ACKnicbVBNaxsxFNQ6Seu4aeOkx1xETMCFYnZLIT26ySVHF2onYJlFK7+1hbUrIb1tazb+Pbnkr+SQ0LotT+k8tqHfA08GbmIb1JjJIOw/AhqG1sbr15W9uvNt5/2G3ubc/cLqwAvpCK20vEu5AyRz6KFHBhbHAs0TBeTI7Xfrnv8A6qfOfODcwyvgkl6kUHL0UN7/32kAvafqJMmO1QU0Z/DGUKUiRlZS5Iosl/e2HibFGmsayDZ+rvJWTKbJF3GyFnbACfUmiNWmRNXpx85aNtSgyFEo7twCg2OSm5RCgWLBiscGC5mfAJDT3OegRuV1akLeuSVMU219ZMjrdTHGyXPnJtniU9mHKfubcUX/OGBabfRqXMTYGQi9VDaGob2TZGx1LCwLV3BMurPR/pWLKLRfo234EqLnJ78kgy+dKOxEP762uifrOurkgBySNonIMemSM9IjfSLIFbkhd+Q+uA5ug4fg7ypaC9Y7H8kTBP/+A5SlpYs=</latexit>P(e|f) ∝ P(f|e)φtm · P(e)φlm
<latexit sha1_base64="FvredCpXA1+h96P06hNzNJA4h8=">ACL3icbZDPahsxEMa1SZu4TptskmMvoiZgX8xuCaTH0ELJ0YHaCXhdo9XO2sLSkizJWbjN8qlr+JLKA2h175F5T+H1u6A4OP7zTCaLzVSOIyiH8HO7ouXe/u1V/WD128Oj8Ljk57TpeXQ5Vpqe5syB1IU0EWBEm6NBaZSCTfp5NOC3wD64QuvuDUwECxUSFywRl6axh+7jThPm/RxFhtUNOM7+H1tcqMWMxTBDuUGUVqtmMJjzT6PkWlZ4Ow0bUjpZFt0W8Fg2yrs4wnCeZ5qWCArlkzvXjyOCgYhYFlzCrJ6UDw/iEjaDvZcEUuEG1vHdGz7yT0Vxb/wqkS/fviYop56Yq9Z2K4dhtsoX5P9YvMf8wqERhSoSCrxblpaQ+mEV4NBMWOMqpF4xb4f9K+ZhZxtFHXPchxJsnb4ve+3YctePr8blx3UcNfKWvCNEpMLckmuSId0CScPZE5+kqfge/AYPAe/Vq07wXrmlPxTwe8/Zdep9Q=</latexit><latexit sha1_base64="FvredCpXA1+h96P06hNzNJA4h8=">ACL3icbZDPahsxEMa1SZu4TptskmMvoiZgX8xuCaTH0ELJ0YHaCXhdo9XO2sLSkizJWbjN8qlr+JLKA2h175F5T+H1u6A4OP7zTCaLzVSOIyiH8HO7ouXe/u1V/WD128Oj8Ljk57TpeXQ5Vpqe5syB1IU0EWBEm6NBaZSCTfp5NOC3wD64QuvuDUwECxUSFywRl6axh+7jThPm/RxFhtUNOM7+H1tcqMWMxTBDuUGUVqtmMJjzT6PkWlZ4Ow0bUjpZFt0W8Fg2yrs4wnCeZ5qWCArlkzvXjyOCgYhYFlzCrJ6UDw/iEjaDvZcEUuEG1vHdGz7yT0Vxb/wqkS/fviYop56Yq9Z2K4dhtsoX5P9YvMf8wqERhSoSCrxblpaQ+mEV4NBMWOMqpF4xb4f9K+ZhZxtFHXPchxJsnb4ve+3YctePr8blx3UcNfKWvCNEpMLckmuSId0CScPZE5+kqfge/AYPAe/Vq07wXrmlPxTwe8/Zdep9Q=</latexit><latexit sha1_base64="FvredCpXA1+h96P06hNzNJA4h8=">ACL3icbZDPahsxEMa1SZu4TptskmMvoiZgX8xuCaTH0ELJ0YHaCXhdo9XO2sLSkizJWbjN8qlr+JLKA2h175F5T+H1u6A4OP7zTCaLzVSOIyiH8HO7ouXe/u1V/WD128Oj8Ljk57TpeXQ5Vpqe5syB1IU0EWBEm6NBaZSCTfp5NOC3wD64QuvuDUwECxUSFywRl6axh+7jThPm/RxFhtUNOM7+H1tcqMWMxTBDuUGUVqtmMJjzT6PkWlZ4Ow0bUjpZFt0W8Fg2yrs4wnCeZ5qWCArlkzvXjyOCgYhYFlzCrJ6UDw/iEjaDvZcEUuEG1vHdGz7yT0Vxb/wqkS/fviYop56Yq9Z2K4dhtsoX5P9YvMf8wqERhSoSCrxblpaQ+mEV4NBMWOMqpF4xb4f9K+ZhZxtFHXPchxJsnb4ve+3YctePr8blx3UcNfKWvCNEpMLckmuSId0CScPZE5+kqfge/AYPAe/Vq07wXrmlPxTwe8/Zdep9Q=</latexit><latexit sha1_base64="FvredCpXA1+h96P06hNzNJA4h8=">ACL3icbZDPahsxEMa1SZu4TptskmMvoiZgX8xuCaTH0ELJ0YHaCXhdo9XO2sLSkizJWbjN8qlr+JLKA2h175F5T+H1u6A4OP7zTCaLzVSOIyiH8HO7ouXe/u1V/WD128Oj8Ljk57TpeXQ5Vpqe5syB1IU0EWBEm6NBaZSCTfp5NOC3wD64QuvuDUwECxUSFywRl6axh+7jThPm/RxFhtUNOM7+H1tcqMWMxTBDuUGUVqtmMJjzT6PkWlZ4Ow0bUjpZFt0W8Fg2yrs4wnCeZ5qWCArlkzvXjyOCgYhYFlzCrJ6UDw/iEjaDvZcEUuEG1vHdGz7yT0Vxb/wqkS/fviYop56Yq9Z2K4dhtsoX5P9YvMf8wqERhSoSCrxblpaQ+mEV4NBMWOMqpF4xb4f9K+ZhZxtFHXPchxJsnb4ve+3YctePr8blx3UcNfKWvCNEpMLckmuSId0CScPZE5+kqfge/AYPAe/Vq07wXrmlPxTwe8/Zdep9Q=</latexit>E.g., log P(e) Chosen to minimize loss
Word Alignment
VP
Extracting Translation Rules
Thank you , I will do it gladly . Gracias , lo haré de muy buen grado .
ADV ADV VP PRP VB MD VP NP
.
S PRP ADV S S VB NP PRP VP
,
will do it ADV
VP
lo haré ADV
Frequency statistics on these rules serve as features in a translation model
Counting Aligned Phrases
à
d’assister à la reunion et ||| to attend the meeting and assister à la reunion ||| attend the meeting la reunion and ||| the meeting and nous ||| we …
- Relative frequencies are the most
important features in a phrase-based
- r syntax-based model.
- Scoring a phrase under a lexical model
is the second most important feature.
- Estimation does not involve choosing
among segmentations of a sentence into phrases.
Slide by Greg Durrett
Interlude: Lexical Translation Models
HMM Alignment Model
Alignment Link Posteriors
Non-zero for any alignment vector (for sentence pair e, f) that has word e aligned to word f
c(e|f; e, f) = X
i
X
j
δ(e, ej) · δ(f, fi) · P(a(j) = i|e, f) = X
i
X
j
δ(e, ej) · δ(f, fi) · X
a
P(a|e, f) · δ(a(j), i)
<latexit sha1_base64="6c2qowzuOEo0clHEh5GyGIFY6lo=">AC3nicpVJNbxMxEPUuXyV8BThyGRGBNlIU7VZIVEKVKrhwDBJpK2WjZdY7bp16P2R7K0WbHLhwACGu/C5u/BDueJMUlZaeGMny0xs/v/GM0pJY8Pwp+dfu37j5q2t2507d+/df9B9+GjflLXmNOalKvVhioaULGhspV0WGnCPFV0kJ68afMHp6SNLIv3dl7RNMejQgrJ0Toq6f7iAS3EK4hztMepaGg5+IPFsg/PdyE2dZ7I9TaDOCNlMaABUDLrQ8yz0p6RYgAikWckjAIMZv1dCYsr4/jzv9YrATYGi2udDivbusZgOwn3V4DFcBl0G0AT2iVHS/RFnJa9zKixXaMwkCis7bVBbyRUtO3FtqEJ+gkc0cbDAnMy0WY1nCc8ck4EotVuFhRV7XtFgbsw8T93JtnBzMdeS/8pNait2po0sqtpSwdGolZgS2hnDZnUxK2aO4BcS1cr8GPUyK37ER3XhOjiky+D/e1hFA6jdy96e6837dhiT9hTFrCIvWR7C0bsTHj3sT76H32vgf/E/+V/b+qjvbTSP2V/hf/8Ny/TfoQ=</latexit><latexit sha1_base64="6c2qowzuOEo0clHEh5GyGIFY6lo=">AC3nicpVJNbxMxEPUuXyV8BThyGRGBNlIU7VZIVEKVKrhwDBJpK2WjZdY7bp16P2R7K0WbHLhwACGu/C5u/BDueJMUlZaeGMny0xs/v/GM0pJY8Pwp+dfu37j5q2t2507d+/df9B9+GjflLXmNOalKvVhioaULGhspV0WGnCPFV0kJ68afMHp6SNLIv3dl7RNMejQgrJ0Toq6f7iAS3EK4hztMepaGg5+IPFsg/PdyE2dZ7I9TaDOCNlMaABUDLrQ8yz0p6RYgAikWckjAIMZv1dCYsr4/jzv9YrATYGi2udDivbusZgOwn3V4DFcBl0G0AT2iVHS/RFnJa9zKixXaMwkCis7bVBbyRUtO3FtqEJ+gkc0cbDAnMy0WY1nCc8ck4EotVuFhRV7XtFgbsw8T93JtnBzMdeS/8pNait2po0sqtpSwdGolZgS2hnDZnUxK2aO4BcS1cr8GPUyK37ER3XhOjiky+D/e1hFA6jdy96e6837dhiT9hTFrCIvWR7C0bsTHj3sT76H32vgf/E/+V/b+qjvbTSP2V/hf/8Ny/TfoQ=</latexit><latexit sha1_base64="6c2qowzuOEo0clHEh5GyGIFY6lo=">AC3nicpVJNbxMxEPUuXyV8BThyGRGBNlIU7VZIVEKVKrhwDBJpK2WjZdY7bp16P2R7K0WbHLhwACGu/C5u/BDueJMUlZaeGMny0xs/v/GM0pJY8Pwp+dfu37j5q2t2507d+/df9B9+GjflLXmNOalKvVhioaULGhspV0WGnCPFV0kJ68afMHp6SNLIv3dl7RNMejQgrJ0Toq6f7iAS3EK4hztMepaGg5+IPFsg/PdyE2dZ7I9TaDOCNlMaABUDLrQ8yz0p6RYgAikWckjAIMZv1dCYsr4/jzv9YrATYGi2udDivbusZgOwn3V4DFcBl0G0AT2iVHS/RFnJa9zKixXaMwkCis7bVBbyRUtO3FtqEJ+gkc0cbDAnMy0WY1nCc8ck4EotVuFhRV7XtFgbsw8T93JtnBzMdeS/8pNait2po0sqtpSwdGolZgS2hnDZnUxK2aO4BcS1cr8GPUyK37ER3XhOjiky+D/e1hFA6jdy96e6837dhiT9hTFrCIvWR7C0bsTHj3sT76H32vgf/E/+V/b+qjvbTSP2V/hf/8Ny/TfoQ=</latexit><latexit sha1_base64="6c2qowzuOEo0clHEh5GyGIFY6lo=">AC3nicpVJNbxMxEPUuXyV8BThyGRGBNlIU7VZIVEKVKrhwDBJpK2WjZdY7bp16P2R7K0WbHLhwACGu/C5u/BDueJMUlZaeGMny0xs/v/GM0pJY8Pwp+dfu37j5q2t2507d+/df9B9+GjflLXmNOalKvVhioaULGhspV0WGnCPFV0kJ68afMHp6SNLIv3dl7RNMejQgrJ0Toq6f7iAS3EK4hztMepaGg5+IPFsg/PdyE2dZ7I9TaDOCNlMaABUDLrQ8yz0p6RYgAikWckjAIMZv1dCYsr4/jzv9YrATYGi2udDivbusZgOwn3V4DFcBl0G0AT2iVHS/RFnJa9zKixXaMwkCis7bVBbyRUtO3FtqEJ+gkc0cbDAnMy0WY1nCc8ck4EotVuFhRV7XtFgbsw8T93JtnBzMdeS/8pNait2po0sqtpSwdGolZgS2hnDZnUxK2aO4BcS1cr8GPUyK37ER3XhOjiky+D/e1hFA6jdy96e6837dhiT9hTFrCIvWR7C0bsTHj3sT76H32vgf/E/+V/b+qjvbTSP2V/hf/8Ny/TfoQ=</latexit>Non-zero for any alignment vector (for sentence pair e, f) that has position j aligned to position i
Model 1 Posteriors
1 2 3 4 4
das the Haus ist klitzeklein house is very small 1 2 3 4
P(a(j) = i|e, f) = t(ej|fi) P
i0 t(ej|fi0)
<latexit sha1_base64="3MYkL/MnwUIcuVeiyMW63AdrANM=">ACQnicbZDLSsNAFIYn9V5vVZduBou2BSmJCLopFN24rGC90IQwmZ60o5MLMxOhxDybG5/AnQ/gxoUibl04qUW09cDAx/+fw5nzezFnUpnmk1GYmp6ZnZtfKC4uLa+sltbWz2WUCAptGvFIXHpEAmchtBVTHC5jASTwOFx4N8e5f3ELQrIoPFODGJyA9ELmM0qUltzSVatKqtc13MAM32E7IKrv+Slkuz/sZzW808C2LwhNVRXca93ouynLalqyTQWMnwb6eiLeyWymbdHBaeBGsEZTSqlt6tLsRTQIFeVEyo5lxspJiVCMcsiKdiIhJvSG9KCjMSQBSCcdRpDhba10sR8J/UKFh+rviZQEUg4CT3fmd8lxLxf/8zqJ8g+dlIVxoiCk34v8hGMV4TxP3GUCqOIDYQKpv+KaZ/orJROvahDsMZPnoTzvbpl1q3T/XLzaBTHPNpEW6iKLHSAmugEtVAbUXSPntErejMejBfj3fj4bi0Yo5kN9KeMzy9bJa5b</latexit><latexit sha1_base64="3MYkL/MnwUIcuVeiyMW63AdrANM=">ACQnicbZDLSsNAFIYn9V5vVZduBou2BSmJCLopFN24rGC90IQwmZ60o5MLMxOhxDybG5/AnQ/gxoUibl04qUW09cDAx/+fw5nzezFnUpnmk1GYmp6ZnZtfKC4uLa+sltbWz2WUCAptGvFIXHpEAmchtBVTHC5jASTwOFx4N8e5f3ELQrIoPFODGJyA9ELmM0qUltzSVatKqtc13MAM32E7IKrv+Slkuz/sZzW808C2LwhNVRXca93ouynLalqyTQWMnwb6eiLeyWymbdHBaeBGsEZTSqlt6tLsRTQIFeVEyo5lxspJiVCMcsiKdiIhJvSG9KCjMSQBSCcdRpDhba10sR8J/UKFh+rviZQEUg4CT3fmd8lxLxf/8zqJ8g+dlIVxoiCk34v8hGMV4TxP3GUCqOIDYQKpv+KaZ/orJROvahDsMZPnoTzvbpl1q3T/XLzaBTHPNpEW6iKLHSAmugEtVAbUXSPntErejMejBfj3fj4bi0Yo5kN9KeMzy9bJa5b</latexit><latexit sha1_base64="3MYkL/MnwUIcuVeiyMW63AdrANM=">ACQnicbZDLSsNAFIYn9V5vVZduBou2BSmJCLopFN24rGC90IQwmZ60o5MLMxOhxDybG5/AnQ/gxoUibl04qUW09cDAx/+fw5nzezFnUpnmk1GYmp6ZnZtfKC4uLa+sltbWz2WUCAptGvFIXHpEAmchtBVTHC5jASTwOFx4N8e5f3ELQrIoPFODGJyA9ELmM0qUltzSVatKqtc13MAM32E7IKrv+Slkuz/sZzW808C2LwhNVRXca93ouynLalqyTQWMnwb6eiLeyWymbdHBaeBGsEZTSqlt6tLsRTQIFeVEyo5lxspJiVCMcsiKdiIhJvSG9KCjMSQBSCcdRpDhba10sR8J/UKFh+rviZQEUg4CT3fmd8lxLxf/8zqJ8g+dlIVxoiCk34v8hGMV4TxP3GUCqOIDYQKpv+KaZ/orJROvahDsMZPnoTzvbpl1q3T/XLzaBTHPNpEW6iKLHSAmugEtVAbUXSPntErejMejBfj3fj4bi0Yo5kN9KeMzy9bJa5b</latexit><latexit sha1_base64="3MYkL/MnwUIcuVeiyMW63AdrANM=">ACQnicbZDLSsNAFIYn9V5vVZduBou2BSmJCLopFN24rGC90IQwmZ60o5MLMxOhxDybG5/AnQ/gxoUibl04qUW09cDAx/+fw5nzezFnUpnmk1GYmp6ZnZtfKC4uLa+sltbWz2WUCAptGvFIXHpEAmchtBVTHC5jASTwOFx4N8e5f3ELQrIoPFODGJyA9ELmM0qUltzSVatKqtc13MAM32E7IKrv+Slkuz/sZzW808C2LwhNVRXca93ouynLalqyTQWMnwb6eiLeyWymbdHBaeBGsEZTSqlt6tLsRTQIFeVEyo5lxspJiVCMcsiKdiIhJvSG9KCjMSQBSCcdRpDhba10sR8J/UKFh+rviZQEUg4CT3fmd8lxLxf/8zqJ8g+dlIVxoiCk34v8hGMV4TxP3GUCqOIDYQKpv+KaZ/orJROvahDsMZPnoTzvbpl1q3T/XLzaBTHPNpEW6iKLHSAmugEtVAbUXSPntErejMejBfj3fj4bi0Yo5kN9KeMzy9bJa5b</latexit>HMM Alignment Model
(Vogel, Stephan, Hermann Ney, and Christoph Tillmann, 1996) "HMM-based word alignment in statistical translation." (Liang, Percy, Ben Taskar, and Dan Klein, 2006) "Alignment by agreement."
P(a, e|f) ∝ Y
j
P(ej|fa(j)) · P(a(j)|a(j − 1))
<latexit sha1_base64="xTPzDPvf4Jdnr+T9THqAT6kUCo=">ACPnicbVBNSwMxEM36WetX1aOXYFa0LIrgh6LXjxWsB/QLUs2O6vR7GZJskLZ9pd58Td48+jFgyJePZpti6h1IJM3782QzPMTzpS27SdrZnZufmGxsFRcXldWy9tbLaUSCWFJhVcyI5PFHAWQ1MzaGTSCRz6Ht357levsOpGIivtT9BHoRuYpZyCjRhvJKzUaF7GM3IvraDzMY4sF3EQ6reM9NpEi0wPkdeDe4UQGTBzj0MlK5qZoWlwZCGz4vByYdONWqVyrbNXsUeBo4E1BGk2h4pUc3EDSNINaUE6W6jp3oXkakZpTDsOimChJCb8kVdA2MSQSql43WH+JdwQ4FNKcWOMR+3MiI5FS/cg3nflq6q+Wk/9p3VSHJ72MxUmqIabjh8KUY+NH7iUOmASqed8AQiUzf8X0mkhCtXG8aExw/q48DVqHNceuORdH5frpxI4C2kY7qIcdIzq6Bw1UBNRdI+e0St6sx6sF+vd+hi3zliTmS30K6zPL1ohrBs=</latexit><latexit sha1_base64="xTPzDPvf4Jdnr+T9THqAT6kUCo=">ACPnicbVBNSwMxEM36WetX1aOXYFa0LIrgh6LXjxWsB/QLUs2O6vR7GZJskLZ9pd58Td48+jFgyJePZpti6h1IJM3782QzPMTzpS27SdrZnZufmGxsFRcXldWy9tbLaUSCWFJhVcyI5PFHAWQ1MzaGTSCRz6Ht357levsOpGIivtT9BHoRuYpZyCjRhvJKzUaF7GM3IvraDzMY4sF3EQ6reM9NpEi0wPkdeDe4UQGTBzj0MlK5qZoWlwZCGz4vByYdONWqVyrbNXsUeBo4E1BGk2h4pUc3EDSNINaUE6W6jp3oXkakZpTDsOimChJCb8kVdA2MSQSql43WH+JdwQ4FNKcWOMR+3MiI5FS/cg3nflq6q+Wk/9p3VSHJ72MxUmqIabjh8KUY+NH7iUOmASqed8AQiUzf8X0mkhCtXG8aExw/q48DVqHNceuORdH5frpxI4C2kY7qIcdIzq6Bw1UBNRdI+e0St6sx6sF+vd+hi3zliTmS30K6zPL1ohrBs=</latexit><latexit sha1_base64="xTPzDPvf4Jdnr+T9THqAT6kUCo=">ACPnicbVBNSwMxEM36WetX1aOXYFa0LIrgh6LXjxWsB/QLUs2O6vR7GZJskLZ9pd58Td48+jFgyJePZpti6h1IJM3782QzPMTzpS27SdrZnZufmGxsFRcXldWy9tbLaUSCWFJhVcyI5PFHAWQ1MzaGTSCRz6Ht357levsOpGIivtT9BHoRuYpZyCjRhvJKzUaF7GM3IvraDzMY4sF3EQ6reM9NpEi0wPkdeDe4UQGTBzj0MlK5qZoWlwZCGz4vByYdONWqVyrbNXsUeBo4E1BGk2h4pUc3EDSNINaUE6W6jp3oXkakZpTDsOimChJCb8kVdA2MSQSql43WH+JdwQ4FNKcWOMR+3MiI5FS/cg3nflq6q+Wk/9p3VSHJ72MxUmqIabjh8KUY+NH7iUOmASqed8AQiUzf8X0mkhCtXG8aExw/q48DVqHNceuORdH5frpxI4C2kY7qIcdIzq6Bw1UBNRdI+e0St6sx6sF+vd+hi3zliTmS30K6zPL1ohrBs=</latexit><latexit sha1_base64="xTPzDPvf4Jdnr+T9THqAT6kUCo=">ACPnicbVBNSwMxEM36WetX1aOXYFa0LIrgh6LXjxWsB/QLUs2O6vR7GZJskLZ9pd58Td48+jFgyJePZpti6h1IJM3782QzPMTzpS27SdrZnZufmGxsFRcXldWy9tbLaUSCWFJhVcyI5PFHAWQ1MzaGTSCRz6Ht357levsOpGIivtT9BHoRuYpZyCjRhvJKzUaF7GM3IvraDzMY4sF3EQ6reM9NpEi0wPkdeDe4UQGTBzj0MlK5qZoWlwZCGz4vByYdONWqVyrbNXsUeBo4E1BGk2h4pUc3EDSNINaUE6W6jp3oXkakZpTDsOimChJCb8kVdA2MSQSql43WH+JdwQ4FNKcWOMR+3MiI5FS/cg3nflq6q+Wk/9p3VSHJ72MxUmqIabjh8KUY+NH7iUOmASqed8AQiUzf8X0mkhCtXG8aExw/q48DVqHNceuORdH5frpxI4C2kY7qIcdIzq6Bw1UBNRdI+e0St6sx6sF+vd+hi3zliTmS30K6zPL1ohrBs=</latexit>1 2 3 4 4
das the Haus ist klitzeklein house is very small 1 2 3 4
HMM Alignment Model Posteriors
Words up to i (summing over alignments) Words after i
P(a(j) = i|e, f) = X
a
P(a|e, f) · δ(a(j), i) = X
a
P(a, e|f) · δ(a(j), i) P(e|f) = αj(i) · βj(i) P(e|f) αj(i) = X
i0
P(a(j) = i|a(j − 1) = i0) · P(ej|fi) · αj1(i0) βj(i) = X
i00
P(a(j + 1) = i00|a(j) = i) · P(ej+1|fi00) · βj+1(i00)
<latexit sha1_base64="Q9S5jtrJxH4u1fubx8IDHrl/H7E=">ADunicfZJba9swFMcVe5cu6Xb417EwhaXpcEeg/ZhbK97NGDpS3ExsiynCiRL1hyIbj+jmNv+zY7lpPMbZcJDH/O5fc/OlaYCy6Vbf/uGeaDh48eHzpP32/MXLweGrC5mVBWVTmomsuAqJZIKnbKq4EuwqLxhJQsEuw9XJn95zQrJs/SHWufMT8g85TGnREoOz9dC1iLY/OL7BXkLUIowrVo93Oq6P8Psz7MkyCQiG4pt9VR6NMoW9iAlFNHOMOUS9fqfiwtCqwYz7rh1rPeC6sq19nXUGDc+jY3me0TkCxIsLb6DhUy1gf9zgNJt3g5e8VGNO4sCceyAHG35rsWCJSTi4K9ly6mgsrZ0JbB3Y3TRW/YHjRy1fJBdeAXZWhvohtv30knw0CbBYGhPbH3wfeFsxBtjhsMfnlRsuEpYoKIuXMsXPlV6RQnApW971SspzQFZmzGciUJEz6lX56NX4HkQjHWQFfqrCOdjsqki5TkKobPYs7+a4L9ys1LFp37F07xULKWtUVwKrDLcvGMc8YJRJdYgC04zIrpgsDfV/Da+7AE5+6V74uLjxPHnjfPw3Pv2zWcYDeoLfIQg46QefoG3LRFHjxPCN2Jibn83Q5OaqLTV6m57X6NYx1R+FtyMx</latexit><latexit sha1_base64="Q9S5jtrJxH4u1fubx8IDHrl/H7E=">ADunicfZJba9swFMcVe5cu6Xb417EwhaXpcEeg/ZhbK97NGDpS3ExsiynCiRL1hyIbj+jmNv+zY7lpPMbZcJDH/O5fc/OlaYCy6Vbf/uGeaDh48eHzpP32/MXLweGrC5mVBWVTmomsuAqJZIKnbKq4EuwqLxhJQsEuw9XJn95zQrJs/SHWufMT8g85TGnREoOz9dC1iLY/OL7BXkLUIowrVo93Oq6P8Psz7MkyCQiG4pt9VR6NMoW9iAlFNHOMOUS9fqfiwtCqwYz7rh1rPeC6sq19nXUGDc+jY3me0TkCxIsLb6DhUy1gf9zgNJt3g5e8VGNO4sCceyAHG35rsWCJSTi4K9ly6mgsrZ0JbB3Y3TRW/YHjRy1fJBdeAXZWhvohtv30knw0CbBYGhPbH3wfeFsxBtjhsMfnlRsuEpYoKIuXMsXPlV6RQnApW971SspzQFZmzGciUJEz6lX56NX4HkQjHWQFfqrCOdjsqki5TkKobPYs7+a4L9ys1LFp37F07xULKWtUVwKrDLcvGMc8YJRJdYgC04zIrpgsDfV/Da+7AE5+6V74uLjxPHnjfPw3Pv2zWcYDeoLfIQg46QefoG3LRFHjxPCN2Jibn83Q5OaqLTV6m57X6NYx1R+FtyMx</latexit><latexit sha1_base64="Q9S5jtrJxH4u1fubx8IDHrl/H7E=">ADunicfZJba9swFMcVe5cu6Xb417EwhaXpcEeg/ZhbK97NGDpS3ExsiynCiRL1hyIbj+jmNv+zY7lpPMbZcJDH/O5fc/OlaYCy6Vbf/uGeaDh48eHzpP32/MXLweGrC5mVBWVTmomsuAqJZIKnbKq4EuwqLxhJQsEuw9XJn95zQrJs/SHWufMT8g85TGnREoOz9dC1iLY/OL7BXkLUIowrVo93Oq6P8Psz7MkyCQiG4pt9VR6NMoW9iAlFNHOMOUS9fqfiwtCqwYz7rh1rPeC6sq19nXUGDc+jY3me0TkCxIsLb6DhUy1gf9zgNJt3g5e8VGNO4sCceyAHG35rsWCJSTi4K9ly6mgsrZ0JbB3Y3TRW/YHjRy1fJBdeAXZWhvohtv30knw0CbBYGhPbH3wfeFsxBtjhsMfnlRsuEpYoKIuXMsXPlV6RQnApW971SspzQFZmzGciUJEz6lX56NX4HkQjHWQFfqrCOdjsqki5TkKobPYs7+a4L9ys1LFp37F07xULKWtUVwKrDLcvGMc8YJRJdYgC04zIrpgsDfV/Da+7AE5+6V74uLjxPHnjfPw3Pv2zWcYDeoLfIQg46QefoG3LRFHjxPCN2Jibn83Q5OaqLTV6m57X6NYx1R+FtyMx</latexit><latexit sha1_base64="Q9S5jtrJxH4u1fubx8IDHrl/H7E=">ADunicfZJba9swFMcVe5cu6Xb417EwhaXpcEeg/ZhbK97NGDpS3ExsiynCiRL1hyIbj+jmNv+zY7lpPMbZcJDH/O5fc/OlaYCy6Vbf/uGeaDh48eHzpP32/MXLweGrC5mVBWVTmomsuAqJZIKnbKq4EuwqLxhJQsEuw9XJn95zQrJs/SHWufMT8g85TGnREoOz9dC1iLY/OL7BXkLUIowrVo93Oq6P8Psz7MkyCQiG4pt9VR6NMoW9iAlFNHOMOUS9fqfiwtCqwYz7rh1rPeC6sq19nXUGDc+jY3me0TkCxIsLb6DhUy1gf9zgNJt3g5e8VGNO4sCceyAHG35rsWCJSTi4K9ly6mgsrZ0JbB3Y3TRW/YHjRy1fJBdeAXZWhvohtv30knw0CbBYGhPbH3wfeFsxBtjhsMfnlRsuEpYoKIuXMsXPlV6RQnApW971SspzQFZmzGciUJEz6lX56NX4HkQjHWQFfqrCOdjsqki5TkKobPYs7+a4L9ys1LFp37F07xULKWtUVwKrDLcvGMc8YJRJdYgC04zIrpgsDfV/Da+7AE5+6V74uLjxPHnjfPw3Pv2zWcYDeoLfIQg46QefoG3LRFHjxPCN2Jibn83Q5OaqLTV6m57X6NYx1R+FtyMx</latexit>Non-zero for alignments where j is aligned to i Forward-Backward algorithm
αj(i) = P(e1, e2, . . . , ej, a(j) = i|f) βj(i) = P(ej+1, ej+2, . . . , e`|a(j) = i, f)
<latexit sha1_base64="Kz5geEP0N/WhYHYl73zGhfRK1o=">ACcXicbVHLahsxFNVMX6n7iPvYhNIiYlpsYsyMCbSbQGg2XbpQJwHLDHfkO7EczQPpTsBMZt/vy4/0U1/oBrbCyfpBcHROTpXV0dxoZWlILj1/EePnzx9tvO89eLlq9e7TdvT21eGoljmevcnMdgUasMx6RI43lhENJY41l8edLoZ1dorMqzX7QscJrCRaYSJYEcFbV/C9DFHKJFV/X4lyM+6mIU9jlGwz4Xs5wsbzaLPofuonekrkUKNI+TKql7nAvREjHSXe1OAjrxuTAsN7uUgnUur5ed3LCVishonYnGASr4g9BuAEdtqlR1L5xnWZYkZSg7WTMChoWoEhJTXWLVFaLEBewgVOHMwgRTutVonV/LNjZjzJjVsZ8RW7agtXaZxu5kM6W9rzXk/7RJScm3aWyoiTM5PqipNSct7Ez2fKoCS9dACkUW5WLudgQJL7pJYLIbz/5IfgdDgIg0H487Bz/H0Txw7wPZl4XsKztmP9iIjZlkf7z3kfvk/fX3/O5v78+6nsbzt2p/yDf3Mmtdc=</latexit><latexit sha1_base64="Kz5geEP0N/WhYHYl73zGhfRK1o=">ACcXicbVHLahsxFNVMX6n7iPvYhNIiYlpsYsyMCbSbQGg2XbpQJwHLDHfkO7EczQPpTsBMZt/vy4/0U1/oBrbCyfpBcHROTpXV0dxoZWlILj1/EePnzx9tvO89eLlq9e7TdvT21eGoljmevcnMdgUasMx6RI43lhENJY41l8edLoZ1dorMqzX7QscJrCRaYSJYEcFbV/C9DFHKJFV/X4lyM+6mIU9jlGwz4Xs5wsbzaLPofuonekrkUKNI+TKql7nAvREjHSXe1OAjrxuTAsN7uUgnUur5ed3LCVishonYnGASr4g9BuAEdtqlR1L5xnWZYkZSg7WTMChoWoEhJTXWLVFaLEBewgVOHMwgRTutVonV/LNjZjzJjVsZ8RW7agtXaZxu5kM6W9rzXk/7RJScm3aWyoiTM5PqipNSct7Ez2fKoCS9dACkUW5WLudgQJL7pJYLIbz/5IfgdDgIg0H487Bz/H0Txw7wPZl4XsKztmP9iIjZlkf7z3kfvk/fX3/O5v78+6nsbzt2p/yDf3Mmtdc=</latexit><latexit sha1_base64="Kz5geEP0N/WhYHYl73zGhfRK1o=">ACcXicbVHLahsxFNVMX6n7iPvYhNIiYlpsYsyMCbSbQGg2XbpQJwHLDHfkO7EczQPpTsBMZt/vy4/0U1/oBrbCyfpBcHROTpXV0dxoZWlILj1/EePnzx9tvO89eLlq9e7TdvT21eGoljmevcnMdgUasMx6RI43lhENJY41l8edLoZ1dorMqzX7QscJrCRaYSJYEcFbV/C9DFHKJFV/X4lyM+6mIU9jlGwz4Xs5wsbzaLPofuonekrkUKNI+TKql7nAvREjHSXe1OAjrxuTAsN7uUgnUur5ed3LCVishonYnGASr4g9BuAEdtqlR1L5xnWZYkZSg7WTMChoWoEhJTXWLVFaLEBewgVOHMwgRTutVonV/LNjZjzJjVsZ8RW7agtXaZxu5kM6W9rzXk/7RJScm3aWyoiTM5PqipNSct7Ez2fKoCS9dACkUW5WLudgQJL7pJYLIbz/5IfgdDgIg0H487Bz/H0Txw7wPZl4XsKztmP9iIjZlkf7z3kfvk/fX3/O5v78+6nsbzt2p/yDf3Mmtdc=</latexit><latexit sha1_base64="Kz5geEP0N/WhYHYl73zGhfRK1o=">ACcXicbVHLahsxFNVMX6n7iPvYhNIiYlpsYsyMCbSbQGg2XbpQJwHLDHfkO7EczQPpTsBMZt/vy4/0U1/oBrbCyfpBcHROTpXV0dxoZWlILj1/EePnzx9tvO89eLlq9e7TdvT21eGoljmevcnMdgUasMx6RI43lhENJY41l8edLoZ1dorMqzX7QscJrCRaYSJYEcFbV/C9DFHKJFV/X4lyM+6mIU9jlGwz4Xs5wsbzaLPofuonekrkUKNI+TKql7nAvREjHSXe1OAjrxuTAsN7uUgnUur5ed3LCVishonYnGASr4g9BuAEdtqlR1L5xnWZYkZSg7WTMChoWoEhJTXWLVFaLEBewgVOHMwgRTutVonV/LNjZjzJjVsZ8RW7agtXaZxu5kM6W9rzXk/7RJScm3aWyoiTM5PqipNSct7Ez2fKoCS9dACkUW5WLudgQJL7pJYLIbz/5IfgdDgIg0H487Bz/H0Txw7wPZl4XsKztmP9iIjZlkf7z3kfvk/fX3/O5v78+6nsbzt2p/yDf3Mmtdc=</latexit>Interlude: Phrase-Based Models
What's Next?
Neural models: attention and the transformer architecture Tricks of the trade: back-translation, knowledge distillation, subword models, and coverage vectors