linguistics 384 language and computers
play

Linguistics 384: Language and Computers approaches Linguistic - PowerPoint PPT Presentation

Language and Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries Transformer Linguistics 384: Language and Computers approaches Linguistic knowledge Topic 5: Machine Translation based


  1. Language and What is MT not good for? Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ Things that require subtle knowledge of the world Background: Dictionaries and/or a high degree of (literary) skill: Transformer approaches ◮ translating Shakespeare into Navaho Linguistic knowledge ◮ diplomatic negotiations based systems ◮ court proceedings Direct transfer systems Interlingua-based systems ◮ . . . Machine learning based systems ◮ Things that may be a life or death situation: Alignment ◮ Pharmaceutical business What makes MT hard? ◮ Automatically translating frantic 911 calls for a Evaluating MT dispatcher who speaks only Spanish systems References 6 / 66

  2. Language and Example translations Computers Topic 5: Machine The simple case Translation Introduction Examples for Translations ◮ It will help to look at a few examples of real translation Background: Dictionaries before talking about how a machine does it. Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 7 / 66

  3. Language and Example translations Computers Topic 5: Machine The simple case Translation Introduction Examples for Translations ◮ It will help to look at a few examples of real translation Background: Dictionaries before talking about how a machine does it. Transformer ◮ Take the simple Spanish sentence and its English approaches Linguistic knowledge translation below: based systems Direct transfer systems Interlingua-based systems (1) Yo hablo espa˜ nol. Machine learning I speak 1 st , sg Spanish based systems Alignment ‘I speak Spanish.’ What makes MT hard? Evaluating MT systems References 7 / 66

  4. Language and Example translations Computers Topic 5: Machine The simple case Translation Introduction Examples for Translations ◮ It will help to look at a few examples of real translation Background: Dictionaries before talking about how a machine does it. Transformer ◮ Take the simple Spanish sentence and its English approaches Linguistic knowledge translation below: based systems Direct transfer systems Interlingua-based systems (1) Yo hablo espa˜ nol. Machine learning I speak 1 st , sg Spanish based systems Alignment ‘I speak Spanish.’ What makes MT hard? Evaluating MT ◮ Words in this example pretty much translate one-for-one systems ◮ But we have to make sure hablo matches with Yo , i.e., References that the subject agrees with the form of the verb. 7 / 66

  5. Language and Example translations Computers Topic 5: Machine A slightly more complex case Translation Introduction Examples for Translations Background: The order and number of words can differ: Dictionaries Transformer approaches (2) a. Tu hablas espa˜ nol? Linguistic knowledge You speak 2 nd , sg Spanish based systems Direct transfer systems Interlingua-based systems ‘Do you speak Spanish?’ Machine learning based systems b. Hablas espa˜ nol? Alignment Speak 2 nd , sg Spanish What makes MT hard? ‘Do you speak Spanish?’ Evaluating MT systems References 8 / 66

  6. Language and What goes into a translation Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Some things to note about these examples and thus what Dictionaries we might need to know to translate: Transformer approaches ◮ Words have to be translated. → dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 9 / 66

  7. Language and What goes into a translation Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Some things to note about these examples and thus what Dictionaries we might need to know to translate: Transformer approaches ◮ Words have to be translated. → dictionaries Linguistic knowledge based systems ◮ Words are grouped into meaningful units (cf. our Direct transfer systems Interlingua-based systems discussion of syntax for grammar checkers). Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 9 / 66

  8. Language and What goes into a translation Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Some things to note about these examples and thus what Dictionaries we might need to know to translate: Transformer approaches ◮ Words have to be translated. → dictionaries Linguistic knowledge based systems ◮ Words are grouped into meaningful units (cf. our Direct transfer systems Interlingua-based systems discussion of syntax for grammar checkers). Machine learning ◮ Word order can differ from language to languge. based systems Alignment What makes MT hard? Evaluating MT systems References 9 / 66

  9. Language and What goes into a translation Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Some things to note about these examples and thus what Dictionaries we might need to know to translate: Transformer approaches ◮ Words have to be translated. → dictionaries Linguistic knowledge based systems ◮ Words are grouped into meaningful units (cf. our Direct transfer systems Interlingua-based systems discussion of syntax for grammar checkers). Machine learning ◮ Word order can differ from language to languge. based systems Alignment ◮ The forms of words within a sentence are systematic, What makes MT hard? e.g., verbs have to be conjugated, etc. Evaluating MT systems References 9 / 66

  10. Language and Different approaches to MT Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ Transformer systems Transformer approaches ◮ Systems based on linguistic knowledge Linguistic knowledge ◮ Direct transfer systems based systems ◮ Interlinguas Direct transfer systems Interlingua-based systems ◮ Machine learning approaches Machine learning based systems Alignment Most of these use dictionaries in one form or another, so we What makes MT will start by looking at dictionaries. hard? Evaluating MT systems References 10 / 66

  11. Language and Dictionaries Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries An MT dictionary differs from a “paper” dictionary: Transformer approaches ◮ must be computer-usable (electronic form, indexed) Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 11 / 66

  12. Language and Dictionaries Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries An MT dictionary differs from a “paper” dictionary: Transformer approaches ◮ must be computer-usable (electronic form, indexed) Linguistic knowledge based systems ◮ needs to be able to handle various word inflections: Direct transfer systems Interlingua-based systems have is the dictionary entry, but we want the entry to Machine learning based systems specify how to conjugate this verb. Alignment What makes MT hard? Evaluating MT systems References 11 / 66

  13. Language and Dictionaries (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ contains (syntactic and semantic) restrictions that a Background: word places on other words Dictionaries Transformer ◮ e.g., subcategorization information: give needs a giver, approaches a person given to, and an object that is given Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 12 / 66

  14. Language and Dictionaries (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ contains (syntactic and semantic) restrictions that a Background: word places on other words Dictionaries Transformer ◮ e.g., subcategorization information: give needs a giver, approaches a person given to, and an object that is given Linguistic knowledge based systems ◮ e.g., selectional restrictions: if X is eating , then X must Direct transfer systems be animate Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 12 / 66

  15. Language and Dictionaries (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ contains (syntactic and semantic) restrictions that a Background: word places on other words Dictionaries Transformer ◮ e.g., subcategorization information: give needs a giver, approaches a person given to, and an object that is given Linguistic knowledge based systems ◮ e.g., selectional restrictions: if X is eating , then X must Direct transfer systems be animate Interlingua-based systems Machine learning ◮ may also contain frequency information based systems Alignment What makes MT hard? Evaluating MT systems References 12 / 66

  16. Language and Dictionaries (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ contains (syntactic and semantic) restrictions that a Background: word places on other words Dictionaries Transformer ◮ e.g., subcategorization information: give needs a giver, approaches a person given to, and an object that is given Linguistic knowledge based systems ◮ e.g., selectional restrictions: if X is eating , then X must Direct transfer systems be animate Interlingua-based systems Machine learning ◮ may also contain frequency information based systems Alignment ◮ can be hierarchically organized, e.g.: What makes MT ◮ all nouns have person, number, and gender hard? ◮ verbs (unless irregular) conjugate in the past tense by Evaluating MT systems adding ed . References 12 / 66

  17. Language and What dictionary entries might look like Computers Topic 5: Machine Translation Introduction ◮  : button Examples for Translations Background:    : noun Dictionaries  : no Transformer approaches  : yes Linguistic knowledge G  : Knopf based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 13 / 66

  18. Language and What dictionary entries might look like Computers Topic 5: Machine Translation Introduction ◮  : button Examples for Translations Background:    : noun Dictionaries  : no Transformer approaches  : yes Linguistic knowledge G  : Knopf based systems Direct transfer systems ◮  : knowledge Interlingua-based systems    : noun Machine learning based systems  : no Alignment  : no What makes MT hard? G  : Wissen, Kenntnisse Evaluating MT systems References 13 / 66

  19. Language and What dictionary entries might look like Computers Topic 5: Machine Translation Introduction ◮  : button Examples for Translations Background:    : noun Dictionaries  : no Transformer approaches  : yes Linguistic knowledge G  : Knopf based systems Direct transfer systems ◮  : knowledge Interlingua-based systems    : noun Machine learning based systems  : no Alignment  : no What makes MT hard? G  : Wissen, Kenntnisse Evaluating MT systems ◮ There can be extra rules which tell you whether to References choose Wissen or Kenntnisse . 13 / 66

  20. Language and A dictionary entry with frequency Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮  : knowledge Transformer approaches    : noun Linguistic knowledge  : no based systems  : no Direct transfer systems Interlingua-based systems G  : Wissen: 80%, Kenntnisse: 20% Machine learning based systems ◮ Probabilities can be derived from various machine Alignment learning techniques → to be discussed later. What makes MT hard? Evaluating MT systems References 14 / 66

  21. Language and Transformer approaches Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Transformer architectures transform example Dictionaries Transformer sentences from one language into another. approaches ◮ They consist of Linguistic knowledge based systems ◮ a grammar for the source/input language Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 15 / 66

  22. Language and Transformer approaches Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Transformer architectures transform example Dictionaries Transformer sentences from one language into another. approaches ◮ They consist of Linguistic knowledge based systems ◮ a grammar for the source/input language Direct transfer systems Interlingua-based systems ◮ a source-to-target language dictionary Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 15 / 66

  23. Language and Transformer approaches Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Transformer architectures transform example Dictionaries Transformer sentences from one language into another. approaches ◮ They consist of Linguistic knowledge based systems ◮ a grammar for the source/input language Direct transfer systems Interlingua-based systems ◮ a source-to-target language dictionary Machine learning ◮ source-to-target language rules based systems Alignment What makes MT hard? Evaluating MT systems References 15 / 66

  24. Language and Transformer approaches Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Transformer architectures transform example Dictionaries Transformer sentences from one language into another. approaches ◮ They consist of Linguistic knowledge based systems ◮ a grammar for the source/input language Direct transfer systems Interlingua-based systems ◮ a source-to-target language dictionary Machine learning ◮ source-to-target language rules based systems Alignment ◮ Note that there is no grammar for the target language, What makes MT hard? only mappings from the source language. Evaluating MT systems References 15 / 66

  25. Language and An example for the transformer appraoch Computers Topic 5: Machine Translation We’ll work through a German-to-English example. Introduction Examples for Translations (3) a. Drehen Sie den Knopf eine Position zur¨ uck. Background: Dictionaries b. Turn the button back one position. Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 16 / 66

  26. Language and An example for the transformer appraoch Computers Topic 5: Machine Translation We’ll work through a German-to-English example. Introduction Examples for Translations (3) a. Drehen Sie den Knopf eine Position zur¨ uck. Background: Dictionaries b. Turn the button back one position. Transformer approaches Linguistic knowledge 1. Using the grammar, assign parts-of-speech: based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 16 / 66

  27. Language and An example for the transformer appraoch Computers Topic 5: Machine Translation We’ll work through a German-to-English example. Introduction Examples for Translations (3) a. Drehen Sie den Knopf eine Position zur¨ uck. Background: Dictionaries b. Turn the button back one position. Transformer approaches Linguistic knowledge 1. Using the grammar, assign parts-of-speech: based systems Direct transfer systems Interlingua-based systems (4) Drehen Sie den Knopf eine Position zur¨ uck. Machine learning based systems verb pron. article noun article noun prep. Alignment What makes MT 2. Using the grammar, give the sentence a (basic) hard? Evaluating MT structure systems References 16 / 66

  28. Language and An example for the transformer appraoch Computers Topic 5: Machine Translation We’ll work through a German-to-English example. Introduction Examples for Translations (3) a. Drehen Sie den Knopf eine Position zur¨ uck. Background: Dictionaries b. Turn the button back one position. Transformer approaches Linguistic knowledge 1. Using the grammar, assign parts-of-speech: based systems Direct transfer systems Interlingua-based systems (4) Drehen Sie den Knopf eine Position zur¨ uck. Machine learning based systems verb pron. article noun article noun prep. Alignment What makes MT 2. Using the grammar, give the sentence a (basic) hard? Evaluating MT structure systems References (5) Drehen Sie [den Knopf] [eine Position] zur¨ uck. 16 / 66

  29. Language and An example for the transformer appraoch Computers Topic 5: Machine Translation We’ll work through a German-to-English example. Introduction Examples for Translations (3) a. Drehen Sie den Knopf eine Position zur¨ uck. Background: Dictionaries b. Turn the button back one position. Transformer approaches Linguistic knowledge 1. Using the grammar, assign parts-of-speech: based systems Direct transfer systems Interlingua-based systems (4) Drehen Sie den Knopf eine Position zur¨ uck. Machine learning based systems verb pron. article noun article noun prep. Alignment What makes MT 2. Using the grammar, give the sentence a (basic) hard? Evaluating MT structure systems References (5) Drehen Sie [den Knopf] [eine Position] zur¨ uck. 16 / 66

  30. Language and An example (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations 3. Using the dictionary, find the target language words Background: Dictionaries Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 17 / 66

  31. Language and An example (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations 3. Using the dictionary, find the target language words Background: Dictionaries Transformer (6) Drehen Sie [den Knopf] [eine Position] zur¨ uck. approaches turn you the button one position back Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 17 / 66

  32. Language and An example (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations 3. Using the dictionary, find the target language words Background: Dictionaries Transformer (6) Drehen Sie [den Knopf] [eine Position] zur¨ uck. approaches turn you the button one position back Linguistic knowledge based systems Direct transfer systems 4. Using the source-to-target rules, reorder, combine, Interlingua-based systems eliminate, or add target language words, e.g., Machine learning based systems ◮ ’turn’ and ’back’ form one unit. Alignment ◮ because ’Drehen . . . zur¨ What makes MT uck’ is a command, in English it hard? is expressed without ’you’. Evaluating MT systems References 17 / 66

  33. Language and An example (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations 3. Using the dictionary, find the target language words Background: Dictionaries Transformer (6) Drehen Sie [den Knopf] [eine Position] zur¨ uck. approaches turn you the button one position back Linguistic knowledge based systems Direct transfer systems 4. Using the source-to-target rules, reorder, combine, Interlingua-based systems eliminate, or add target language words, e.g., Machine learning based systems ◮ ’turn’ and ’back’ form one unit. Alignment ◮ because ’Drehen . . . zur¨ What makes MT uck’ is a command, in English it hard? is expressed without ’you’. Evaluating MT systems ⇒ End result: Turn back the button one position. References 17 / 66

  34. Language and Transformers: Less than meets the eye Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ By their very nature, transformer systems are Background: Dictionaries non- reversible because they lack a target language Transformer grammar. approaches Linguistic knowledge If we have a German to English translation system, for based systems example, we are incapable of translating from English to Direct transfer systems Interlingua-based systems German. Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 18 / 66

  35. Language and Transformers: Less than meets the eye Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ By their very nature, transformer systems are Background: Dictionaries non- reversible because they lack a target language Transformer grammar. approaches Linguistic knowledge If we have a German to English translation system, for based systems example, we are incapable of translating from English to Direct transfer systems Interlingua-based systems German. Machine learning based systems ◮ However, as these systems do not require sophisticated Alignment knowledge of the target language, they are usually very What makes MT hard? robust = they will return a result for nearly any input Evaluating MT sentence. systems References 18 / 66

  36. Language and Linguistic knowledge-based systems Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ Linguistic knowledge-based systems include knowledge Transformer approaches of both the source and the target languages. Linguistic knowledge based systems ◮ We will look at direct transfer systems and then the Direct transfer systems Interlingua-based systems more specific instance of interlinguas. Machine learning ◮ Direct transfer systems based systems Alignment ◮ Interlinguas What makes MT hard? Evaluating MT systems References 19 / 66

  37. Language and Direct transfer systems Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries A direct transfer systems consists of: Transformer approaches ◮ A source language grammar Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 20 / 66

  38. Language and Direct transfer systems Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries A direct transfer systems consists of: Transformer approaches ◮ A source language grammar Linguistic knowledge based systems ◮ A target language grammar Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 20 / 66

  39. Language and Direct transfer systems Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries A direct transfer systems consists of: Transformer approaches ◮ A source language grammar Linguistic knowledge based systems ◮ A target language grammar Direct transfer systems Interlingua-based systems ◮ Rules relating source language underlying Machine learning representation to target language underlying based systems Alignment representation What makes MT hard? Evaluating MT systems References 20 / 66

  40. Language and Direct transfer systems (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ A direct transfer system has a transfer component Background: Dictionaries which relates a source language representation with a Transformer target language representation. approaches ◮ This can also be called a comparative grammar . Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 21 / 66

  41. Language and Direct transfer systems (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ A direct transfer system has a transfer component Background: Dictionaries which relates a source language representation with a Transformer target language representation. approaches ◮ This can also be called a comparative grammar . Linguistic knowledge based systems Direct transfer systems ◮ We’ll walk through the following French to English Interlingua-based systems example: Machine learning based systems Alignment (7) Londres plaˆ ıt a ` Sam. What makes MT hard? London is pleasing to Sam Evaluating MT ‘Sam likes London.’ systems References 21 / 66

  42. Language and Direct transfer systems (cont.) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ A direct transfer system has a transfer component Background: Dictionaries which relates a source language representation with a Transformer target language representation. approaches ◮ This can also be called a comparative grammar . Linguistic knowledge based systems Direct transfer systems ◮ We’ll walk through the following French to English Interlingua-based systems example: Machine learning based systems Alignment (7) Londres plaˆ ıt a ` Sam. What makes MT hard? London is pleasing to Sam Evaluating MT ‘Sam likes London.’ systems References 21 / 66

  43. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Dictionaries Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 22 / 66

  44. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Dictionaries Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 22 / 66

  45. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Dictionaries Transformer 2. The transfer component relates this source language approaches UR (French UR) to a target language UR (English UR). Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 22 / 66

  46. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Dictionaries Transformer 2. The transfer component relates this source language approaches UR (French UR) to a target language UR (English UR). Linguistic knowledge based systems French UR English UR Direct transfer systems Interlingua-based systems X plaire Y Eng(Y) like Eng(X) ↔ Machine learning (where Eng(X) means the English translation of X) based systems Alignment Londres plaire Sam (source UR) → Sam like London What makes MT hard? (target UR) Evaluating MT systems References 22 / 66

  47. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Dictionaries Transformer 2. The transfer component relates this source language approaches UR (French UR) to a target language UR (English UR). Linguistic knowledge based systems French UR English UR Direct transfer systems Interlingua-based systems X plaire Y Eng(Y) like Eng(X) ↔ Machine learning (where Eng(X) means the English translation of X) based systems Alignment Londres plaire Sam (source UR) → Sam like London What makes MT hard? (target UR) Evaluating MT systems 3. target language grammar translates the target language References UR into an actual target language sentence. 22 / 66

  48. Language and Steps in a transfer system Computers Topic 5: Machine Translation 1. source language grammar analyzes the input and puts Introduction Examples for Translations it into an underlying representation (UR). Background: Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Dictionaries Transformer 2. The transfer component relates this source language approaches UR (French UR) to a target language UR (English UR). Linguistic knowledge based systems French UR English UR Direct transfer systems Interlingua-based systems X plaire Y Eng(Y) like Eng(X) ↔ Machine learning (where Eng(X) means the English translation of X) based systems Alignment Londres plaire Sam (source UR) → Sam like London What makes MT hard? (target UR) Evaluating MT systems 3. target language grammar translates the target language References UR into an actual target language sentence. Sam like London → Sam likes London. 22 / 66

  49. Language and Things to note about transfer systems Computers Topic 5: Machine Translation Introduction ◮ The transfer mechanism is essentially reversible; e.g., Examples for Translations the plaire rule works in both directions (at least in Background: Dictionaries theory) Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 23 / 66

  50. Language and Things to note about transfer systems Computers Topic 5: Machine Translation Introduction ◮ The transfer mechanism is essentially reversible; e.g., Examples for Translations the plaire rule works in both directions (at least in Background: Dictionaries theory) Transformer approaches ◮ Because we have a separate target language grammar, Linguistic knowledge we are able to ensure that the rules of English apply; based systems Direct transfer systems like → likes . Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 23 / 66

  51. Language and Things to note about transfer systems Computers Topic 5: Machine Translation Introduction ◮ The transfer mechanism is essentially reversible; e.g., Examples for Translations the plaire rule works in both directions (at least in Background: Dictionaries theory) Transformer approaches ◮ Because we have a separate target language grammar, Linguistic knowledge we are able to ensure that the rules of English apply; based systems Direct transfer systems like → likes . Interlingua-based systems ◮ Word order is handled differently than with Machine learning based systems transformers: the URs are essentially unordered. Alignment What makes MT hard? Evaluating MT systems References 23 / 66

  52. Language and Things to note about transfer systems Computers Topic 5: Machine Translation Introduction ◮ The transfer mechanism is essentially reversible; e.g., Examples for Translations the plaire rule works in both directions (at least in Background: Dictionaries theory) Transformer approaches ◮ Because we have a separate target language grammar, Linguistic knowledge we are able to ensure that the rules of English apply; based systems Direct transfer systems like → likes . Interlingua-based systems ◮ Word order is handled differently than with Machine learning based systems transformers: the URs are essentially unordered. Alignment What makes MT ◮ The underlying representation can be of various levels hard? of abstraction – words, syntactic trees, meaning Evaluating MT systems representations, etc.; we will talk about this with the References translation triangle . 23 / 66

  53. Language and Caveat about reversibility Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ It seems like reversible rules are highly desirable—and Transformer approaches in general they are—but we may not always want Linguistic knowledge based systems reversible rules. Direct transfer systems Interlingua-based systems ◮ e.g., Dutch aanvangen should be translated into English Machine learning as begin , but English begin should be translated into based systems Alignment Dutch as beginnen . What makes MT hard? Evaluating MT systems References 24 / 66

  54. Language and Levels of abstraction Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ There are differing levels of abstraction at which transfer Transformer can take place. So far we have looked at URs that approaches Linguistic knowledge represent only word information. based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 25 / 66

  55. Language and Levels of abstraction Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ There are differing levels of abstraction at which transfer Transformer can take place. So far we have looked at URs that approaches Linguistic knowledge represent only word information. based systems Direct transfer systems ◮ We can do a full syntactic analysis, which helps us to Interlingua-based systems know how the words in a sentence relate. Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 25 / 66

  56. Language and Levels of abstraction Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ There are differing levels of abstraction at which transfer Transformer can take place. So far we have looked at URs that approaches Linguistic knowledge represent only word information. based systems Direct transfer systems ◮ We can do a full syntactic analysis, which helps us to Interlingua-based systems know how the words in a sentence relate. Machine learning based systems ◮ Or we can do only a partial syntactic analysis, such as Alignment What makes MT representing the dependencies between words. hard? Evaluating MT systems References 25 / 66

  57. Language and Czech-English example Computers Topic 5: Machine Translation (8) Kaufman & Broad odm´ ıtla institucion´ aln´ ı investory Introduction Kaufman & Broad declined institutional investors Examples for Translations jmenovat. Background: Dictionaries to name/identify Transformer approaches ‘Kaufman & Broad refused to name the institutional Linguistic knowledge investors.’ based systems Direct transfer systems Interlingua-based systems Example taken from ˇ Cmejrek, Cuˇ r´ ın, and Havelka (2003). Machine learning based systems ◮ They find the base forms of words (e.g., obmidout ’to Alignment decline’ instead of odm´ ıtla ’declined’) What makes MT hard? ◮ They find which words depend on which other words Evaluating MT systems and represent this in a tree (e.g., the noun investory References depends on the verb jmenovat ) ◮ This dependency tree is then converted to English (comparative grammar) and re-ordered as appropriate. 26 / 66

  58. Language and Dependency tree for Czech-English example Computers Topic 5: Machine Translation Introduction obmitnout Examples for Translations decline Background: Dictionaries Transformer approaches & jmenovat Linguistic knowledge name & based systems Direct transfer systems Interlingua-based systems Machine learning based systems Kaufman Broad investor Alignment investor Kaufman Broad What makes MT hard? Evaluating MT systems institucionaini References instituional 27 / 66

  59. Language and Interlinguas Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Ideally, we could use an interlingua = a Dictionaries language-independent representation of meaning. Transformer approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 28 / 66

  60. Language and Interlinguas Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Ideally, we could use an interlingua = a Dictionaries language-independent representation of meaning. Transformer approaches ◮ Benefit: To add new languages to your MT system, you Linguistic knowledge merely have to provide mapping rules between your based systems Direct transfer systems language and the interlingua, and then you can Interlingua-based systems translate into any other language in your system. Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 28 / 66

  61. Language and Interlinguas Computers Topic 5: Machine Translation Introduction Examples for Translations Background: ◮ Ideally, we could use an interlingua = a Dictionaries language-independent representation of meaning. Transformer approaches ◮ Benefit: To add new languages to your MT system, you Linguistic knowledge merely have to provide mapping rules between your based systems Direct transfer systems language and the interlingua, and then you can Interlingua-based systems translate into any other language in your system. Machine learning based systems Alignment ◮ What your interlingua looks like depends on your goals; What makes MT an example for I shot the sheriff. is shown on the hard? following slide. Evaluating MT systems References 28 / 66

  62. Language and Interlingua example Computers Topic 5: Machine Translation    wound               gun    Introduction                    Examples for Translations    past                   Background:          maybe       Dictionaries                      speaker      Transformer                   approaches          first                              Linguistic knowledge          sg             based systems                            ?   Direct transfer systems             Interlingua-based systems             sheriff              Machine learning                     based systems   yes                        Alignment               third                    What makes MT                singular      hard?                                    ?   Evaluating MT                       systems               yes                          References              yes                                      -  kind of job                                      -  -  -  officer     29 / 66

  63. Language and Interlingual problems Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ What exactly should be represented in the interlingua? Background: Dictionaries ◮ e.g., English corner = Spanish rinc´ on = ’inside corner’ Transformer or esquina = ’outside corner’ approaches Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 30 / 66

  64. Language and Interlingual problems Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ What exactly should be represented in the interlingua? Background: Dictionaries ◮ e.g., English corner = Spanish rinc´ on = ’inside corner’ Transformer or esquina = ’outside corner’ approaches Linguistic knowledge ◮ A fine-grained interlingua can require extra based systems (unnecessary) work: Direct transfer systems Interlingua-based systems ◮ e.g., Japanese distinguishes older brother from younger Machine learning based systems brother , so we have to disambiguate English brother to Alignment put it into the interlingua. Then, if we translate into What makes MT hard? French, we have to ignore the disambiguation and Evaluating MT simply translate it as fr` ere , which simply means systems ’brother’. References 30 / 66

  65. Language and The translation triangle Computers Topic 5: Machine Translation Interlingua Introduction Examples for Translations Background: Dictionaries Transformer approaches Linguistic knowledge Depth based systems Direct transfer systems of Interlingua-based systems Machine learning Analysis based systems Transfer System Alignment What makes MT hard? Evaluating MT systems References Source Target Size of comparative grammar between languages 31 / 66

  66. Language and Machine learning Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ Instead of trying to tell the MT system how we’re going Transformer approaches to translate, we might try a machine learning approach Linguistic knowledge = the computer will learn how to translate based on based systems Direct transfer systems example translations. Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 32 / 66

  67. Language and Machine learning Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ Instead of trying to tell the MT system how we’re going Transformer approaches to translate, we might try a machine learning approach Linguistic knowledge = the computer will learn how to translate based on based systems Direct transfer systems example translations. Interlingua-based systems ◮ For this, we need Machine learning based systems ◮ examples of translations as training data , and Alignment ◮ a way of learning from that data. What makes MT hard? Evaluating MT systems References 32 / 66

  68. Language and Using frequency (statistical methods) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ We can look at how often a source language word is Background: Dictionaries translated as a target language word, i.e., the Transformer frequency of a given translation, and choose the most approaches Linguistic knowledge frequent translation. based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 33 / 66

  69. Language and Using frequency (statistical methods) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ We can look at how often a source language word is Background: Dictionaries translated as a target language word, i.e., the Transformer frequency of a given translation, and choose the most approaches Linguistic knowledge frequent translation. based systems ◮ But how can we tell what a word is being translated as? Direct transfer systems Interlingua-based systems There are two different cases: Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 33 / 66

  70. Language and Using frequency (statistical methods) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ We can look at how often a source language word is Background: Dictionaries translated as a target language word, i.e., the Transformer frequency of a given translation, and choose the most approaches Linguistic knowledge frequent translation. based systems ◮ But how can we tell what a word is being translated as? Direct transfer systems Interlingua-based systems There are two different cases: Machine learning based systems ◮ We are told what each word is translated as: text Alignment alignment What makes MT hard? Evaluating MT systems References 33 / 66

  71. Language and Using frequency (statistical methods) Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ We can look at how often a source language word is Background: Dictionaries translated as a target language word, i.e., the Transformer frequency of a given translation, and choose the most approaches Linguistic knowledge frequent translation. based systems ◮ But how can we tell what a word is being translated as? Direct transfer systems Interlingua-based systems There are two different cases: Machine learning based systems ◮ We are told what each word is translated as: text Alignment alignment What makes MT hard? ◮ We are not told what each word is translated as: use a Evaluating MT bag of words systems References 33 / 66

  72. Language and Text alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries Sometimes humans have provided informative training data: Transformer approaches ◮ sentence alignment Linguistic knowledge based systems ◮ word alignment Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 34 / 66

  73. Language and Text alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries Sometimes humans have provided informative training data: Transformer approaches ◮ sentence alignment Linguistic knowledge based systems ◮ word alignment Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 34 / 66

  74. Language and Text alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries Sometimes humans have provided informative training data: Transformer approaches ◮ sentence alignment Linguistic knowledge based systems ◮ word alignment Direct transfer systems Interlingua-based systems Machine learning The process of text alignment can also be automated and based systems Alignment then used to train an MT system. What makes MT hard? Evaluating MT systems References 34 / 66

  75. Language and Sentence alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ sentence alignment = determine which source Transformer approaches language sentences align with which target language Linguistic knowledge based systems ones (what we assumed in the bag of words example). Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 35 / 66

  76. Language and Sentence alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ sentence alignment = determine which source Transformer approaches language sentences align with which target language Linguistic knowledge based systems ones (what we assumed in the bag of words example). Direct transfer systems Interlingua-based systems ◮ Intuitively easy, but can be difficult in practice since Machine learning different languages have different punctuation based systems Alignment conventions. What makes MT hard? Evaluating MT systems References 35 / 66

  77. Language and Word alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ word alignment = determine which source language Transformer approaches words align with which target language ones Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 36 / 66

  78. Language and Word alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ word alignment = determine which source language Transformer approaches words align with which target language ones Linguistic knowledge based systems ◮ Much harder than sentence alignment to do Direct transfer systems automatically. Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 36 / 66

  79. Language and Word alignment Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ word alignment = determine which source language Transformer approaches words align with which target language ones Linguistic knowledge based systems ◮ Much harder than sentence alignment to do Direct transfer systems automatically. Interlingua-based systems ◮ But if it has already been done for us, it gives us good Machine learning based systems information about what a word’s translation equivalent Alignment is. What makes MT hard? Evaluating MT systems References 36 / 66

  80. Language and Different word alignments Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ One word can map to one word or to multiple words. Transformer Likewise, sometimes it is best for multiple words to align approaches with multiple words. Linguistic knowledge based systems ◮ English-Russian examples: Direct transfer systems Interlingua-based systems ◮ one-to-one: khorosho = well Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 37 / 66

  81. Language and Different word alignments Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ One word can map to one word or to multiple words. Transformer Likewise, sometimes it is best for multiple words to align approaches with multiple words. Linguistic knowledge based systems ◮ English-Russian examples: Direct transfer systems Interlingua-based systems ◮ one-to-one: khorosho = well Machine learning based systems ◮ one-to-many: kniga = the book Alignment What makes MT hard? Evaluating MT systems References 37 / 66

  82. Language and Different word alignments Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ One word can map to one word or to multiple words. Transformer Likewise, sometimes it is best for multiple words to align approaches with multiple words. Linguistic knowledge based systems ◮ English-Russian examples: Direct transfer systems Interlingua-based systems ◮ one-to-one: khorosho = well Machine learning based systems ◮ one-to-many: kniga = the book Alignment ◮ many-to-one: to take a walk = gulyat’ What makes MT hard? Evaluating MT systems References 37 / 66

  83. Language and Different word alignments Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ One word can map to one word or to multiple words. Transformer Likewise, sometimes it is best for multiple words to align approaches with multiple words. Linguistic knowledge based systems ◮ English-Russian examples: Direct transfer systems Interlingua-based systems ◮ one-to-one: khorosho = well Machine learning based systems ◮ one-to-many: kniga = the book Alignment ◮ many-to-one: to take a walk = gulyat’ What makes MT ◮ many-to-many: at least = khotya by (’although if/would’) hard? Evaluating MT systems References 37 / 66

  84. Language and Calculating probabilities Computers Topic 5: Machine Translation Introduction Examples for Translations ◮ With word alignments, it is relatively easy to calculate Background: Dictionaries probabilities. Transformer approaches ◮ e.g., What is the probability that run translates as correr Linguistic knowledge in Spanish? based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment What makes MT hard? Evaluating MT systems References 38 / 66

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend