4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 29, 2020

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p (( j , i ) ∈ a | o , s )

4CSLL5 IBM Translation Models Avoiding Exponential Cost

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Outline Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p (( j , i ) ∈ a | o , s )

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost?

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost? ◮ the learnability of translation probabilites in an unsupervised fashion from just a corpus of pairs is a remarkable thing

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost? ◮ the learnability of translation probabilites in an unsupervised fashion from just a corpus of pairs is a remarkable thing ◮ however, as we have formulated it, each possible alignment has to be considered in turn, and each contributes increments to expected counts

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost? ◮ the learnability of translation probabilites in an unsupervised fashion from just a corpus of pairs is a remarkable thing ◮ however, as we have formulated it, each possible alignment has to be considered in turn, and each contributes increments to expected counts ◮ it was already noted that the number of possible alignments is ( ℓ s + 1) ℓ o – ie. exponential in the length of o . For ℓ s + 1 = ℓ o = 10, this is 10 10 , or 10,000 million

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost? ◮ the learnability of translation probabilites in an unsupervised fashion from just a corpus of pairs is a remarkable thing ◮ however, as we have formulated it, each possible alignment has to be considered in turn, and each contributes increments to expected counts ◮ it was already noted that the number of possible alignments is ( ℓ s + 1) ℓ o – ie. exponential in the length of o . For ℓ s + 1 = ℓ o = 10, this is 10 10 , or 10,000 million ◮ so unless a way can be found to make the EM process on this model much more efficient, its learnability in principle would just be an interesting curiosity

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently but what about Exponential cost? ◮ the learnability of translation probabilites in an unsupervised fashion from just a corpus of pairs is a remarkable thing ◮ however, as we have formulated it, each possible alignment has to be considered in turn, and each contributes increments to expected counts ◮ it was already noted that the number of possible alignments is ( ℓ s + 1) ℓ o – ie. exponential in the length of o . For ℓ s + 1 = ℓ o = 10, this is 10 10 , or 10,000 million ◮ so unless a way can be found to make the EM process on this model much more efficient, its learnability in principle would just be an interesting curiosity ◮ it turns out that by studying a little more closely the formula where alignments are summed over, and doing some conversions of ’sums-over-products’ to ’products-over-sums’, it is indeed possible to make the EM process on this model much more efficient.

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments 2 in the formula for p ( � o , a , ℓ o , s � ) everything except the translation probs is going to cancel out when you take ratios . . .

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments Looking at the brute-force EM algorithm, need to calculate p ( a | o , s ) – call this γ d ( a ). 2 in the formula for p ( � o , a , ℓ o , s � ) everything except the translation probs is going to cancel out when you take ratios . . .

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments Looking at the brute-force EM algorithm, need to calculate p ( a | o , s ) – call this γ d ( a ). its fairly easy to see that this is 2 � j p ( o j | s a ( j ) ) γ d ( a ) = (11) � � j p ( o j | s a ′ ( j ) ) a ′ 2 in the formula for p ( � o , a , ℓ o , s � ) everything except the translation probs is going to cancel out when you take ratios . . .

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments Looking at the brute-force EM algorithm, need to calculate p ( a | o , s ) – call this γ d ( a ). its fairly easy to see that this is 2 � j p ( o j | s a ( j ) ) γ d ( a ) = (11) � � j p ( o j | s a ′ ( j ) ) a ′ ◮ The numerator is a product. 2 in the formula for p ( � o , a , ℓ o , s � ) everything except the translation probs is going to cancel out when you take ratios . . .

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments Looking at the brute-force EM algorithm, need to calculate p ( a | o , s ) – call this γ d ( a ). its fairly easy to see that this is 2 � j p ( o j | s a ( j ) ) γ d ( a ) = (11) � � j p ( o j | s a ′ ( j ) ) a ′ ◮ The numerator is a product. ◮ It turns out the denominator can also be turned into product of sums 2 in the formula for p ( � o , a , ℓ o , s � ) everything except the translation probs is going to cancel out when you take ratios . . .

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments contd each j can be aligned to any i between 0 and I , hence I I J � � � � � t ( o j | s a ( j ) ) = . . . t ( o j | s a ( j ) ) j =1 a j a (1)=0 a ( J )=0 = = =

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments contd each j can be aligned to any i between 0 and I , hence I I J � � � � � t ( o j | s a ( j ) ) = . . . t ( o j | s a ( j ) ) j =1 a j a (1)=0 a ( J )=0 I I � � = [ t ( o 1 | s a (1) ) . . . t ( o J | s a ( J ) )] . . . a (1)=0 a ( J )=0 = =

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments contd each j can be aligned to any i between 0 and I , hence I I J � � � � � t ( o j | s a ( j ) ) = . . . t ( o j | s a ( j ) ) j =1 a j a (1)=0 a ( J )=0 I I � � = [ t ( o 1 | s a (1) ) . . . t ( o J | s a ( J ) )] . . . a (1)=0 a ( J )=0 each � I a ( j )=0 () effects just one t ( o j | s a ( j ) ) term, and this means we can use a sum-of-products to product-of-sums conversion, hence = =

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments contd each j can be aligned to any i between 0 and I , hence I I J � � � � � t ( o j | s a ( j ) ) = . . . t ( o j | s a ( j ) ) j =1 a j a (1)=0 a ( J )=0 I I � � = [ t ( o 1 | s a (1) ) . . . t ( o J | s a ( J ) )] . . . a (1)=0 a ( J )=0 each � I a ( j )=0 () effects just one t ( o j | s a ( j ) ) term, and this means we can use a sum-of-products to product-of-sums conversion, hence J I � � = [ t ( o j | s a ( j ) )] j =1 a ( j )=0 =

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Summing over alignments contd each j can be aligned to any i between 0 and I , hence I I J � � � � � t ( o j | s a ( j ) ) = . . . t ( o j | s a ( j ) ) j =1 a j a (1)=0 a ( J )=0 I I � � = [ t ( o 1 | s a (1) ) . . . t ( o J | s a ( J ) )] . . . a (1)=0 a ( J )=0 each � I a ( j )=0 () effects just one t ( o j | s a ( j ) ) term, and this means we can use a sum-of-products to product-of-sums conversion, hence J I � � = [ t ( o j | s a ( j ) )] j =1 a ( j )=0 J I � � = [ t ( o j | s i )] j =1 i =0

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Pause: did you believe that? the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force

4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Pause: did you believe that? the key step above was a conversion from a sum-of-products to a product-of-sums. for the case of o s having length 2 can relatively easily verify by brute force 2 2 2 � � � t ( o j | s a ( j ) ) = a (1)=0 a (2)=0 j =1 = t ( o 1 | s 0 ) t ( o 2 | s 0 ) + t ( o 1 | s 0 ) t ( o 2 | s 1 ) + t ( o 1 | s 0 ) t ( o 2 | s 2 ) + = =

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p (( j , i ) a | o , s ) 4CSLL5 IBM

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Sums & 1 + x + x 2 + + x n G = n x - x 2 - - x n - x n + 1 Money xG = n

Sum-of-products Evaluation Schemes with Fixed-Point arithmetic, and their application to IIR

Some subsystems of constant-depth Frege with parity Michal Garl k Polytechnic University of

Object-Oriented Software Engineering

An introduction to Sum-Product Networks (SPNs): A new deep probabilistic architecture Felix

Set-theoretic remarks on a possible definition of elementary -topos Giulio Lo Monaco Masaryk

How Simulations and Databases Play Nicely Alex Szalay, JHU Gerard Lemson, MPA Thursday,

Information Retrieval CS276: Information Retrieval and Web Search

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models Parameter learning (efficient) How to sum alignments efficiently Efficient EM via p (( j , i ) a | o , s ) 4CSLL5 IBM

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Sums &amp; 1 + x + x 2 + + x n G = n x - x 2 - - x n - x n + 1 Money xG = n

Sum-of-products Evaluation Schemes with Fixed-Point arithmetic, and their application to IIR

Some subsystems of constant-depth Frege with parity Michal Garl k Polytechnic University of

Object-Oriented Software Engineering

An introduction to Sum-Product Networks (SPNs): A new deep probabilistic architecture Felix

Set-theoretic remarks on a possible definition of elementary -topos Giulio Lo Monaco Masaryk

How Simulations and Databases Play Nicely Alex Szalay, JHU Gerard Lemson, MPA Thursday,

Information Retrieval CS276: Information Retrieval and Web Search

Sums & 1 + x + x 2 + + x n G = n x - x 2 - - x n - x n + 1 Money xG = n