4csll5 ibm translation models
play

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments IBM Model 1 definitions 4CSLL5 IBM Translation Models IBM models


  1. 4CSLL5 IBM Translation Models IBM models Probabilities and Translation The Noisy-Channel formulation ◮ recalling Bayesian classification, finding s from o : P ( s , o ) arg max P ( s | o ) = arg max (1) P ( o ) s s = arg max P ( s , o ) (2) s = arg max P ( o | s ) × P ( s ) (3) s ◮ can then try to factorise P ( o | s ) and P ( s ) into clever combination of other probability distributions (not sparse, learnable, allowing solution of arg-max problem). IBM models 1-5 can be used for P ( o | s );

  2. 4CSLL5 IBM Translation Models IBM models Probabilities and Translation The Noisy-Channel formulation ◮ recalling Bayesian classification, finding s from o : P ( s , o ) arg max P ( s | o ) = arg max (1) P ( o ) s s = arg max P ( s , o ) (2) s = arg max P ( o | s ) × P ( s ) (3) s ◮ can then try to factorise P ( o | s ) and P ( s ) into clever combination of other probability distributions (not sparse, learnable, allowing solution of arg-max problem). IBM models 1-5 can be used for P ( o | s ); P ( s ) is the topic of so-called ’language models’.

  3. 4CSLL5 IBM Translation Models IBM models Probabilities and Translation The Noisy-Channel formulation ◮ recalling Bayesian classification, finding s from o : P ( s , o ) arg max P ( s | o ) = arg max (1) P ( o ) s s = arg max P ( s , o ) (2) s = arg max P ( o | s ) × P ( s ) (3) s ◮ can then try to factorise P ( o | s ) and P ( s ) into clever combination of other probability distributions (not sparse, learnable, allowing solution of arg-max problem). IBM models 1-5 can be used for P ( o | s ); P ( s ) is the topic of so-called ’language models’. ◮ The reason for the notation s and o is that (3) is the defining equation of Shannons ’noisy-channel’ formulation of decoding, where an original ’source’ s has to be recovered from a noisy observed signal o , the noisiness defined by P ( o | s )

  4. 4CSLL5 IBM Translation Models IBM models Probabilities and Translation Now have to start look at the details of the IBM models of P ( o | s ), starting with the very simplest models What all the models have in common is that they define P ( o | s ) as a combination of other probability distributions

  5. 4CSLL5 IBM Translation Models IBM models Alignments Outline IBM models Probabilities and Translation Alignments IBM Model 1 definitions

  6. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally)

  7. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg.

  8. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg. 1 2 3 4 das Haus ist klein the house is small 1 2 3 4

  9. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg. 1 2 3 4 1 2 3 4 das Haus ist klein das Haus ist klitzeklein the house is small the house is very small 1 2 3 4 1 2 3 4 5

  10. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg. 1 2 3 4 1 2 3 4 das Haus ist klein das Haus ist klitzeklein the house is small the house is very small 1 2 3 4 1 2 3 4 5 ◮ In SMT such a piece-wise correspondence is called an alignment

  11. 4CSLL5 IBM Translation Models IBM models Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg. 1 2 3 4 1 2 3 4 das Haus ist klein das Haus ist klitzeklein the house is small the house is very small 1 2 3 4 1 2 3 4 5 ◮ In SMT such a piece-wise correspondence is called an alignment ◮ warning: there are quite a lot of varying formal definitions of alignment

  12. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment

  13. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment ◮ key feature of the IBM models is to assume there is a hidden alignment, a between o and s

  14. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment ◮ key feature of the IBM models is to assume there is a hidden alignment, a between o and s ◮ so a pair � o , s � from a sentence-aligned corpus is seen as a partial version of the fully observed case: � o , a , s �

  15. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment ◮ key feature of the IBM models is to assume there is a hidden alignment, a between o and s ◮ so a pair � o , s � from a sentence-aligned corpus is seen as a partial version of the fully observed case: � o , a , s � ◮ A model is essentially made of p ( o , a | s ), and having this allows other things to be defined

  16. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment ◮ key feature of the IBM models is to assume there is a hidden alignment, a between o and s ◮ so a pair � o , s � from a sentence-aligned corpus is seen as a partial version of the fully observed case: � o , a , s � ◮ A model is essentially made of p ( o , a | s ), and having this allows other things to be defined ◮ best translation: � arg max P ( s , o ) = arg max ([ p ( o , a | s )] × p ( s )) s s a

  17. 4CSLL5 IBM Translation Models IBM models Alignments Hidden Alignment ◮ key feature of the IBM models is to assume there is a hidden alignment, a between o and s ◮ so a pair � o , s � from a sentence-aligned corpus is seen as a partial version of the fully observed case: � o , a , s � ◮ A model is essentially made of p ( o , a | s ), and having this allows other things to be defined ◮ best translation: � arg max P ( s , o ) = arg max ([ p ( o , a | s )] × p ( s )) s s a ◮ best alignment: arg max [ p ( o , a | s )] a

  18. 4CSLL5 IBM Translation Models IBM models Alignments IBM Alignments ◮ Define alignment with a function,

  19. 4CSLL5 IBM Translation Models IBM models Alignments IBM Alignments ◮ Define alignment with a function, from posn j in o to posn. i in s

  20. 4CSLL5 IBM Translation Models IBM models Alignments IBM Alignments ◮ Define alignment with a function, from posn j in o to posn. i in s so a : j → i

  21. 4CSLL5 IBM Translation Models IBM models Alignments IBM Alignments ◮ Define alignment with a function, from posn j in o to posn. i in s so a : j → i ◮ the picture 1 2 3 4 das Haus ist klein the house is small 1 2 3 4

  22. 4CSLL5 IBM Translation Models IBM models Alignments IBM Alignments ◮ Define alignment with a function, from posn j in o to posn. i in s so a : j → i ◮ the picture 1 2 3 4 das Haus ist klein the house is small 1 2 3 4 represents a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 }

  23. 4CSLL5 IBM Translation Models IBM models Alignments Some weirdness about directions 1 2 3 4 a : 1 → 1 , das Haus ist klein 2 → 2 , 3 → 3 , 4 → 4 the house is small 1 2 3 4

  24. 4CSLL5 IBM Translation Models IBM models Alignments Some weirdness about directions 1 2 3 4 a : 1 → 1 , das Haus ist klein 2 → 2 , 3 → 3 , 4 → 4 the house is small 1 2 3 4 ◮ Note here o is English, and s is German

  25. 4CSLL5 IBM Translation Models IBM models Alignments Some weirdness about directions 1 2 3 4 a : 1 → 1 , das Haus ist klein 2 → 2 , 3 → 3 , 4 → 4 the house is small 1 2 3 4 ◮ Note here o is English, and s is German ◮ the alignment goes up the page, English-to-German,

  26. 4CSLL5 IBM Translation Models IBM models Alignments Some weirdness about directions 1 2 3 4 a : 1 → 1 , das Haus ist klein 2 → 2 , 3 → 3 , 4 → 4 the house is small 1 2 3 4 ◮ Note here o is English, and s is German ◮ the alignment goes up the page, English-to-German, ◮ they will be used though in a model of P ( o | s ), so down the page, German-to-English

  27. 4CSLL5 IBM Translation Models IBM models Alignments Comparison to ’edit distance’ alignments in case you have ever studied ’edit distance’ alignments . . . ◮ like edit-dist alignments, its a function: so can’t align 1 o words with 2 s words ◮ like edit-dist alignments, some s words can be unmapped to (cf. insertions) ◮ like edit-dist alignments, some o words can be mapped to nothing (cf. deletions) ◮ unlike edit-dist alignments, order not preserved: so j < j ′ �→ a ( j ) < a ( j ′ )

  28. 4CSLL5 IBM Translation Models IBM models Alignments N-to-1 Alignment (ie. 1-to-N Translation) 1 2 3 4 das Haus ist klitzeklein the house is very small 1 2 3 4 5 ◮ a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 , 5 → 4 }

  29. 4CSLL5 IBM Translation Models IBM models Alignments N-to-1 Alignment (ie. 1-to-N Translation) 1 2 3 4 das Haus ist klitzeklein the house is very small 1 2 3 4 5 ◮ a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 , 5 → 4 } ◮ N words of o can be aligned to 1 word of s (needed when 1 word of s translates into N words of o )

  30. 4CSLL5 IBM Translation Models IBM models Alignments Reordering 1 2 3 4 klein ist das Haus the house is small 1 2 3 4

  31. 4CSLL5 IBM Translation Models IBM models Alignments Reordering 1 2 3 4 klein ist das Haus the house is small 1 2 3 4 ◮ a : { 1 → 3 , 2 → 4 , 3 → 2 , 4 → 1 }

  32. 4CSLL5 IBM Translation Models IBM models Alignments Reordering 1 2 3 4 klein ist das Haus the house is small 1 2 3 4 ◮ a : { 1 → 3 , 2 → 4 , 3 → 2 , 4 → 1 } ◮ alignment does not preserve o word order (needed when s words reordered during translation)

  33. 4CSLL5 IBM Translation Models IBM models Alignments s words not mapped to (ie. dropped in translation) 1 2 3 4 5 das Haus ist ja klein the house is small 1 2 3 4

  34. 4CSLL5 IBM Translation Models IBM models Alignments s words not mapped to (ie. dropped in translation) 1 2 3 4 5 das Haus ist ja klein the house is small 1 2 3 4 ◮ a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 5 } ◮ some s words are not mapped-to by the alignment (needed when s words are dropped during translation (here the German flavouring particle ’ja’ is dropped)

  35. 4CSLL5 IBM Translation Models IBM models Alignments o words mapped to nothing (ie. inserting in translation) 0 1 2 3 4 5 NULL ich gehe nicht zum haus I do not go to the house 1 2 3 4 5 6 7

  36. 4CSLL5 IBM Translation Models IBM models Alignments o words mapped to nothing (ie. inserting in translation) 0 1 2 3 4 5 NULL ich gehe nicht zum haus I do not go to the house 1 2 3 4 5 6 7 ◮ a : { 1 → 1 , 2 → 0 , 3 → 3 , 4 → 2 , 5 → 4 , 6 → 4 , 7 → 5 }

  37. 4CSLL5 IBM Translation Models IBM models Alignments o words mapped to nothing (ie. inserting in translation) 0 1 2 3 4 5 NULL ich gehe nicht zum haus I do not go to the house 1 2 3 4 5 6 7 ◮ a : { 1 → 1 , 2 → 0 , 3 → 3 , 4 → 2 , 5 → 4 , 6 → 4 , 7 → 5 } ◮ some o word are mapped to nothing by the alignment (needed when o words have no clear origin during translation) The is no clear origin in German of the English ’do’ formally represented by alignment to special null token

  38. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Outline IBM models Probabilities and Translation Alignments IBM Model 1 definitions

  39. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions IBM Model 1

  40. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions IBM Model 1 ◮ basically a hidden variable a , aligning o to s is assumed.

  41. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions IBM Model 1 ◮ basically a hidden variable a , aligning o to s is assumed. ◮ in more detail, IBM model 1 will define a probability model of P ( o , a , L , s ) where L is length for o sentences, and a is an alignment from o sentences of length L to s .

  42. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions IBM Model 1 ◮ basically a hidden variable a , aligning o to s is assumed. ◮ in more detail, IBM model 1 will define a probability model of P ( o , a , L , s ) where L is length for o sentences, and a is an alignment from o sentences of length L to s . ◮ o , a , L are intended to be synchronized in the sense that if L is not the ℓ o the probability is zero. Similarly if a is not an alignment function from length L sequences to length ℓ s sequences, the probability is 0. So we will write P ( o , a , ℓ o , s )

  43. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency

  44. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s )

  45. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s ) the IBM model1 assumptions are all about P ( o , a , ℓ o | s ). The assumptions can be shown by a succession of applications of the chain rule concerning ( o , a , ℓ o )

  46. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s ) the IBM model1 assumptions are all about P ( o , a , ℓ o | s ). The assumptions can be shown by a succession of applications of the chain rule concerning ( o , a , ℓ o ) ◮ concerning ℓ o , still without any particular assumptions P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | s )

  47. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s ) the IBM model1 assumptions are all about P ( o , a , ℓ o | s ). The assumptions can be shown by a succession of applications of the chain rule concerning ( o , a , ℓ o ) ◮ concerning ℓ o , still without any particular assumptions P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | s ) An assumption of IBM Model 1 is that the dependency p ( ℓ o | s ) can be expressed as a dependency just on the length ℓ s , so by some distribution p ( L | ℓ s ).

  48. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s ) the IBM model1 assumptions are all about P ( o , a , ℓ o | s ). The assumptions can be shown by a succession of applications of the chain rule concerning ( o , a , ℓ o ) ◮ concerning ℓ o , still without any particular assumptions P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | s ) An assumption of IBM Model 1 is that the dependency p ( ℓ o | s ) can be expressed as a dependency just on the length ℓ s , so by some distribution p ( L | ℓ s ). ◮ Usually its stated that p ( L | ℓ s ) is uniform: ie. all L equally likely

  49. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Length dependency ◮ first without any assumptions, via the chain rule: P ( o , a , ℓ o , s ) = P ( o , a , ℓ o | s ) × P ( s ) the IBM model1 assumptions are all about P ( o , a , ℓ o | s ). The assumptions can be shown by a succession of applications of the chain rule concerning ( o , a , ℓ o ) ◮ concerning ℓ o , still without any particular assumptions P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | s ) An assumption of IBM Model 1 is that the dependency p ( ℓ o | s ) can be expressed as a dependency just on the length ℓ s , so by some distribution p ( L | ℓ s ). ◮ Usually its stated that p ( L | ℓ s ) is uniform: ie. all L equally likely ◮ We will see in a while that for many of the vital calculations for training the model, the actually values of p ( L | ℓ s ) are irrelevant

  50. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s )

  51. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s ) ◮ analysing P ( o , a | ℓ o , s ), a further application of the chain rule gives P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × P ( a | ℓ o , s ) (4)

  52. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s ) ◮ analysing P ( o , a | ℓ o , s ), a further application of the chain rule gives P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × P ( a | ℓ o , s ) (4) ◮ The next assumption is that the dependency P ( a | ℓ o , s ) can be expressed as dependency just on ℓ s and ℓ o ,

  53. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s ) ◮ analysing P ( o , a | ℓ o , s ), a further application of the chain rule gives P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × P ( a | ℓ o , s ) (4) ◮ The next assumption is that the dependency P ( a | ℓ o , s ) can be expressed as dependency just on ℓ s and ℓ o , and furthermore that the distribution of possible alignments from length ℓ o sequences to length ℓ s sequences is a uniform distribution

  54. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s ) ◮ analysing P ( o , a | ℓ o , s ), a further application of the chain rule gives P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × P ( a | ℓ o , s ) (4) ◮ The next assumption is that the dependency P ( a | ℓ o , s ) can be expressed as dependency just on ℓ s and ℓ o , and furthermore that the distribution of possible alignments from length ℓ o sequences to length ℓ s sequences is a uniform distribution ◮ There are ℓ o members of o to be aligned, and for each there are ℓ s + 1 possibilities (including NULL mappings), so there are ( ℓ s + 1) ℓ o possible alignments,

  55. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Alignment dependency ◮ we have so far P ( o , a , ℓ o | s ) = P ( o , a | ℓ o , s ) × p ( ℓ o | ℓ s ) ◮ analysing P ( o , a | ℓ o , s ), a further application of the chain rule gives P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × P ( a | ℓ o , s ) (4) ◮ The next assumption is that the dependency P ( a | ℓ o , s ) can be expressed as dependency just on ℓ s and ℓ o , and furthermore that the distribution of possible alignments from length ℓ o sequences to length ℓ s sequences is a uniform distribution ◮ There are ℓ o members of o to be aligned, and for each there are ℓ s + 1 possibilities (including NULL mappings), so there are ( ℓ s + 1) ℓ o possible alignments, so this means 1 p ( a | ℓ o , ℓ s ) = ( ℓ s + 1) ℓ o

  56. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Observed words dependency

  57. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Observed words dependency ◮ this means the formula for P ( o , a | ℓ o , s ) from (4) now looks like this 1 P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × (5) ( ℓ s + 1) ℓ o

  58. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Observed words dependency ◮ this means the formula for P ( o , a | ℓ o , s ) from (4) now looks like this 1 P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × (5) ( ℓ s + 1) ℓ o ◮ finally concerning P ( o | a , ℓ o , s ) it is assumed that this probability takes a particularly simple multiplicative form, with each o j treated as independent of everything else given the word in s that it is aligned to, that is, s a ( j ) , so

  59. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Observed words dependency ◮ this means the formula for P ( o , a | ℓ o , s ) from (4) now looks like this 1 P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × (5) ( ℓ s + 1) ℓ o ◮ finally concerning P ( o | a , ℓ o , s ) it is assumed that this probability takes a particularly simple multiplicative form, with each o j treated as independent of everything else given the word in s that it is aligned to, that is, s a ( j ) , so � p ( o | a , ℓ o , s ) = [ p ( o j | s a ( j ) )] j

  60. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Observed words dependency ◮ this means the formula for P ( o , a | ℓ o , s ) from (4) now looks like this 1 P ( o , a | ℓ o , s ) = P ( o | a , ℓ o , s ) × (5) ( ℓ s + 1) ℓ o ◮ finally concerning P ( o | a , ℓ o , s ) it is assumed that this probability takes a particularly simple multiplicative form, with each o j treated as independent of everything else given the word in s that it is aligned to, that is, s a ( j ) , so � p ( o | a , ℓ o , s ) = [ p ( o j | s a ( j ) )] j ◮ and P ( o , a | ℓ o , s ) becomes 1 � P ( o , a | ℓ o , s ) = [ p ( o j | s a ( j ) )] × (6) ( ℓ s + 1) ℓ o j

  61. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions The final IBM Model 1 formula

  62. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions The final IBM Model 1 formula 1 � P ( o , a , ℓ o | s ) = [ p ( o j | s a ( j ) )] × ( ℓ s + 1) ℓ o × p ( ℓ o | ℓ s ) j

  63. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions The final IBM Model 1 formula 1 � P ( o , a , ℓ o | s ) = [ p ( o j | s a ( j ) )] × ( ℓ s + 1) ℓ o × p ( ℓ o | ℓ s ) j or slightly more compactly P ( o , a , ℓ o | s ) = p ( ℓ o | ℓ s ) � ( ℓ s + 1) ℓ o × [ p ( o j | s a ( j ) )] (7) j

  64. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions the ’generative’ story Another way to arrive at the formula is via the following so-called ’generative story’ for generating o from s

  65. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions the ’generative’ story Another way to arrive at the formula is via the following so-called ’generative story’ for generating o from s 1. choose a length ℓ o , according to a distribution p ( ℓ o | ℓ s )

  66. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions the ’generative’ story Another way to arrive at the formula is via the following so-called ’generative story’ for generating o from s 1. choose a length ℓ o , according to a distribution p ( ℓ o | ℓ s ) 2. choose an alignment a from 1 . . . ℓ o to 0 , 1 , . . . ℓ s , according to distribution 1 p ( a | ℓ s , ℓ o ) = ( ℓ s +1) ℓ o

  67. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions the ’generative’ story Another way to arrive at the formula is via the following so-called ’generative story’ for generating o from s 1. choose a length ℓ o , according to a distribution p ( ℓ o | ℓ s ) 2. choose an alignment a from 1 . . . ℓ o to 0 , 1 , . . . ℓ s , according to distribution 1 p ( a | ℓ s , ℓ o ) = ( ℓ s +1) ℓ o 3. for j = 1 to j = ℓ o , choose o j according to distribution p ( o j | s a ( j ) )

  68. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example 1 1 see p87 Koehn book

  69. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example 1 ◮ Suppose s is das haus ist klein and o is the house is small . Recall the alignment from o to s shown earlier: 1 2 3 4 a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 } das Haus ist klein the house is small 1 2 3 4 1 see p87 Koehn book

  70. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example 1 ◮ Suppose s is das haus ist klein and o is the house is small . Recall the alignment from o to s shown earlier: 1 2 3 4 a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 } das Haus ist klein the house is small 1 2 3 4 ◮ we will illustrate the value of p ( o , a , ℓ o | s ) in this case, according to the formula (7) p ( ℓ o | ℓ s ) � P ( o , a , ℓ o | s ) = ( ℓ s + 1) ℓ o × [ p ( o j | s a ( j ) )] j 1 see p87 Koehn book

  71. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example cntd

  72. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example cntd suppose following tables giving t ( e | g ) for various German and English words das Haus ist klein e t ( e | g ) e t ( e | g ) e t ( e | g ) e t ( e | g ) 0.7 0.8 0.8 0.4 the house is small that 0.15 building 0.16 ’s 0.16 little 0.4 which 0.075 home 0.02 exists 0.02 short 0.1 0.05 0.015 0.015 0.06 who household has minor this 0.025 shell 0.005 are 0.005 petty 0.04

  73. 4CSLL5 IBM Translation Models IBM models IBM Model 1 definitions Example cntd suppose following tables giving t ( e | g ) for various German and English words das Haus ist klein e t ( e | g ) e t ( e | g ) e t ( e | g ) e t ( e | g ) 0.7 0.8 0.8 0.4 the house is small that 0.15 building 0.16 ’s 0.16 little 0.4 which 0.075 home 0.02 exists 0.02 short 0.1 0.05 0.015 0.015 0.06 who household has minor this 0.025 shell 0.005 are 0.005 petty 0.04 let ǫ represent the P ( ℓ o = 4 | ℓ s = 4) term

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend