Linguistics 384: Language and Computers Transformer approaches - - PowerPoint PPT Presentation

linguistics 384 language and computers
SMART_READER_LITE
LIVE PREVIEW

Linguistics 384: Language and Computers Transformer approaches - - PowerPoint PPT Presentation

Language and Computers Topic 5: Machine Translation Introduction Examples for Translations Background: Dictionaries Linguistics 384: Language and Computers Transformer approaches Topic 5: Machine Translation Linguistic knowledge based


slide-1
SLIDE 1

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Linguistics 384: Language and Computers

Topic 5: Machine Translation

Scott Martin∗

  • Dept. of Linguistics, OSU

Spring 2008

∗ The course was created by Chris Brew, Markus Dickinson and Detmar Meurers.

1 / 66

slide-2
SLIDE 2

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction

2 / 66

slide-3
SLIDE 3

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries

2 / 66

slide-4
SLIDE 4

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches

2 / 66

slide-5
SLIDE 5

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches Linguistic knowledge based systems

2 / 66

slide-6
SLIDE 6

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches Linguistic knowledge based systems Machine learning based systems

2 / 66

slide-7
SLIDE 7

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches Linguistic knowledge based systems Machine learning based systems What makes MT hard?

2 / 66

slide-8
SLIDE 8

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches Linguistic knowledge based systems Machine learning based systems What makes MT hard? Evaluating MT systems

2 / 66

slide-9
SLIDE 9

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Outline

Introduction Background: Dictionaries Transformer approaches Linguistic knowledge based systems Machine learning based systems What makes MT hard? Evaluating MT systems References

2 / 66

slide-10
SLIDE 10

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is Machine Translation?

Translation is the process of:

◮ moving texts from one (human) language (source

language) to another (target language),

◮ in a way that preserves meaning.

3 / 66

slide-11
SLIDE 11

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is Machine Translation?

Translation is the process of:

◮ moving texts from one (human) language (source

language) to another (target language),

◮ in a way that preserves meaning.

Machine translation (MT) automates (part of) the process:

◮ Fully automatic translation ◮ Computer-aided (human) translation

3 / 66

slide-12
SLIDE 12

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is MT good for?

◮ When you need the gist of something and there are no

human translators around:

◮ translating e-mails & webpages ◮ obtaining information from sources in multiple

languages (e.g., search engines)

4 / 66

slide-13
SLIDE 13

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is MT good for?

◮ When you need the gist of something and there are no

human translators around:

◮ translating e-mails & webpages ◮ obtaining information from sources in multiple

languages (e.g., search engines)

◮ If you have a limited vocabulary and a small range of

sentence types:

◮ translating weather reports ◮ translating technical manuals ◮ translating terms in scientific meetings ◮ determining if certain words or ideas appear in

suspected terrorist documents → help pin down which documents need to be looked at closely

4 / 66

slide-14
SLIDE 14

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is MT good for?

◮ When you need the gist of something and there are no

human translators around:

◮ translating e-mails & webpages ◮ obtaining information from sources in multiple

languages (e.g., search engines)

◮ If you have a limited vocabulary and a small range of

sentence types:

◮ translating weather reports ◮ translating technical manuals ◮ translating terms in scientific meetings ◮ determining if certain words or ideas appear in

suspected terrorist documents → help pin down which documents need to be looked at closely

◮ If you want your human translators to focus on

interesting/difficult sentences while avoiding lookup of unknown words and translation of mundane sentences.

4 / 66

slide-15
SLIDE 15

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Is MT needed?

◮ Translation is of immediate importance for multilingual

countries (Canada, India, Switzerland, . . . ), international institutions (United Nations, International Monetary Fund, World Trade Organization, . . . ), multinational or exporting companies.

◮ The European Union used to have 11 official languages,

since May 1, 2004 it has 20. All federal laws and other documents have to be translated into all languages.

5 / 66

slide-16
SLIDE 16

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is MT not good for?

◮ Things that require subtle knowledge of the world

and/or a high degree of (literary) skill:

◮ translating Shakespeare into Navaho ◮ diplomatic negotiations ◮ court proceedings ◮ . . . 6 / 66

slide-17
SLIDE 17

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What is MT not good for?

◮ Things that require subtle knowledge of the world

and/or a high degree of (literary) skill:

◮ translating Shakespeare into Navaho ◮ diplomatic negotiations ◮ court proceedings ◮ . . .

◮ Things that may be a life or death situation:

◮ Pharmaceutical business ◮ Automatically translating frantic 911 calls for a

dispatcher who speaks only Spanish

6 / 66

slide-18
SLIDE 18

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example translations

The simple case

◮ It will help to look at a few examples of real translation

before talking about how a machine does it.

7 / 66

slide-19
SLIDE 19

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example translations

The simple case

◮ It will help to look at a few examples of real translation

before talking about how a machine does it.

◮ Take the simple Spanish sentence and its English

translation below: (1) Yo I hablo speak1st,sg espa˜ nol. Spanish

‘I speak Spanish.’

7 / 66

slide-20
SLIDE 20

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example translations

The simple case

◮ It will help to look at a few examples of real translation

before talking about how a machine does it.

◮ Take the simple Spanish sentence and its English

translation below: (1) Yo I hablo speak1st,sg espa˜ nol. Spanish

‘I speak Spanish.’

◮ Words in this example pretty much translate one-for-one ◮ But we have to make sure hablo matches with Yo, i.e.,

that the subject agrees with the form of the verb.

7 / 66

slide-21
SLIDE 21

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example translations

A slightly more complex case

The order and number of words can differ: (2) a. Tu hablas espa˜ nol? You speak2nd,sg Spanish

‘Do you speak Spanish?’

  • b. Hablas espa˜

nol? Speak2nd,sg Spanish

‘Do you speak Spanish?’

8 / 66

slide-22
SLIDE 22

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What goes into a translation

Some things to note about these examples and thus what we might need to know to translate:

◮ Words have to be translated. → dictionaries

9 / 66

slide-23
SLIDE 23

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What goes into a translation

Some things to note about these examples and thus what we might need to know to translate:

◮ Words have to be translated. → dictionaries ◮ Words are grouped into meaningful units (cf. our

discussion of syntax for grammar checkers).

9 / 66

slide-24
SLIDE 24

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What goes into a translation

Some things to note about these examples and thus what we might need to know to translate:

◮ Words have to be translated. → dictionaries ◮ Words are grouped into meaningful units (cf. our

discussion of syntax for grammar checkers).

◮ Word order can differ from language to languge.

9 / 66

slide-25
SLIDE 25

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What goes into a translation

Some things to note about these examples and thus what we might need to know to translate:

◮ Words have to be translated. → dictionaries ◮ Words are grouped into meaningful units (cf. our

discussion of syntax for grammar checkers).

◮ Word order can differ from language to languge. ◮ The forms of words within a sentence are systematic,

e.g., verbs have to be conjugated, etc.

9 / 66

slide-26
SLIDE 26

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Different approaches to MT

◮ Transformer systems ◮ Systems based on linguistic knowledge

◮ Direct transfer systems ◮ Interlinguas

◮ Machine learning approaches

Most of these use dictionaries in one form or another, so we will start by looking at dictionaries.

10 / 66

slide-27
SLIDE 27

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries

An MT dictionary differs from a “paper” dictionary:

◮ must be computer-usable (electronic form, indexed)

11 / 66

slide-28
SLIDE 28

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries

An MT dictionary differs from a “paper” dictionary:

◮ must be computer-usable (electronic form, indexed) ◮ needs to be able to handle various word inflections:

have is the dictionary entry, but we want the entry to specify how to conjugate this verb.

11 / 66

slide-29
SLIDE 29

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries (cont.)

◮ contains (syntactic and semantic) restrictions that a

word places on other words

◮ e.g., subcategorization information: give needs a giver,

a person given to, and an object that is given

12 / 66

slide-30
SLIDE 30

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries (cont.)

◮ contains (syntactic and semantic) restrictions that a

word places on other words

◮ e.g., subcategorization information: give needs a giver,

a person given to, and an object that is given

◮ e.g., selectional restrictions: if X is eating, then X must

be animate

12 / 66

slide-31
SLIDE 31

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries (cont.)

◮ contains (syntactic and semantic) restrictions that a

word places on other words

◮ e.g., subcategorization information: give needs a giver,

a person given to, and an object that is given

◮ e.g., selectional restrictions: if X is eating, then X must

be animate

◮ may also contain frequency information

12 / 66

slide-32
SLIDE 32

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dictionaries (cont.)

◮ contains (syntactic and semantic) restrictions that a

word places on other words

◮ e.g., subcategorization information: give needs a giver,

a person given to, and an object that is given

◮ e.g., selectional restrictions: if X is eating, then X must

be animate

◮ may also contain frequency information ◮ can be hierarchically organized, e.g.:

◮ all nouns have person, number, and gender ◮ verbs (unless irregular) conjugate in the past tense by

adding ed.

12 / 66

slide-33
SLIDE 33

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What dictionary entries might look like

◮ : button

  : noun : no : yes G: Knopf

13 / 66

slide-34
SLIDE 34

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What dictionary entries might look like

◮ : button

  : noun : no : yes G: Knopf

◮ : knowledge

  : noun : no : no G: Wissen, Kenntnisse

13 / 66

slide-35
SLIDE 35

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What dictionary entries might look like

◮ : button

  : noun : no : yes G: Knopf

◮ : knowledge

  : noun : no : no G: Wissen, Kenntnisse

◮ There can be extra rules which tell you whether to

choose Wissen or Kenntnisse.

13 / 66

slide-36
SLIDE 36

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

A dictionary entry with frequency

◮ : knowledge

  : noun : no : no G: Wissen: 80%, Kenntnisse: 20%

◮ Probabilities can be derived from various machine

learning techniques → to be discussed later.

14 / 66

slide-37
SLIDE 37

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformer approaches

◮ Transformer architectures transform example

sentences from one language into another.

◮ They consist of

◮ a grammar for the source/input language 15 / 66

slide-38
SLIDE 38

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformer approaches

◮ Transformer architectures transform example

sentences from one language into another.

◮ They consist of

◮ a grammar for the source/input language ◮ a source-to-target language dictionary 15 / 66

slide-39
SLIDE 39

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformer approaches

◮ Transformer architectures transform example

sentences from one language into another.

◮ They consist of

◮ a grammar for the source/input language ◮ a source-to-target language dictionary ◮ source-to-target language rules 15 / 66

slide-40
SLIDE 40

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformer approaches

◮ Transformer architectures transform example

sentences from one language into another.

◮ They consist of

◮ a grammar for the source/input language ◮ a source-to-target language dictionary ◮ source-to-target language rules

◮ Note that there is no grammar for the target language,

  • nly mappings from the source language.

15 / 66

slide-41
SLIDE 41

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example for the transformer appraoch

We’ll work through a German-to-English example. (3) a. Drehen Sie den Knopf eine Position zur¨ uck.

  • b. Turn the button back one position.

16 / 66

slide-42
SLIDE 42

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example for the transformer appraoch

We’ll work through a German-to-English example. (3) a. Drehen Sie den Knopf eine Position zur¨ uck.

  • b. Turn the button back one position.
  • 1. Using the grammar, assign parts-of-speech:

16 / 66

slide-43
SLIDE 43

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example for the transformer appraoch

We’ll work through a German-to-English example. (3) a. Drehen Sie den Knopf eine Position zur¨ uck.

  • b. Turn the button back one position.
  • 1. Using the grammar, assign parts-of-speech:

(4) Drehen verb Sie pron. den article Knopf noun eine article Position noun zur¨ uck. prep.

  • 2. Using the grammar, give the sentence a (basic)

structure

16 / 66

slide-44
SLIDE 44

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example for the transformer appraoch

We’ll work through a German-to-English example. (3) a. Drehen Sie den Knopf eine Position zur¨ uck.

  • b. Turn the button back one position.
  • 1. Using the grammar, assign parts-of-speech:

(4) Drehen verb Sie pron. den article Knopf noun eine article Position noun zur¨ uck. prep.

  • 2. Using the grammar, give the sentence a (basic)

structure (5) Drehen Sie [den Knopf] [eine Position] zur¨ uck.

16 / 66

slide-45
SLIDE 45

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example for the transformer appraoch

We’ll work through a German-to-English example. (3) a. Drehen Sie den Knopf eine Position zur¨ uck.

  • b. Turn the button back one position.
  • 1. Using the grammar, assign parts-of-speech:

(4) Drehen verb Sie pron. den article Knopf noun eine article Position noun zur¨ uck. prep.

  • 2. Using the grammar, give the sentence a (basic)

structure (5) Drehen Sie [den Knopf] [eine Position] zur¨ uck.

16 / 66

slide-46
SLIDE 46

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example (cont.)

  • 3. Using the dictionary, find the target language words

17 / 66

slide-47
SLIDE 47

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example (cont.)

  • 3. Using the dictionary, find the target language words

(6) Drehen turn Sie you [den the Knopf] button [eine

  • ne

Position] position zur¨ uck. back

17 / 66

slide-48
SLIDE 48

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example (cont.)

  • 3. Using the dictionary, find the target language words

(6) Drehen turn Sie you [den the Knopf] button [eine

  • ne

Position] position zur¨ uck. back

  • 4. Using the source-to-target rules, reorder, combine,

eliminate, or add target language words, e.g.,

◮ ’turn’ and ’back’ form one unit. ◮ because ’Drehen . . . zur¨

uck’ is a command, in English it is expressed without ’you’.

17 / 66

slide-49
SLIDE 49

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

An example (cont.)

  • 3. Using the dictionary, find the target language words

(6) Drehen turn Sie you [den the Knopf] button [eine

  • ne

Position] position zur¨ uck. back

  • 4. Using the source-to-target rules, reorder, combine,

eliminate, or add target language words, e.g.,

◮ ’turn’ and ’back’ form one unit. ◮ because ’Drehen . . . zur¨

uck’ is a command, in English it is expressed without ’you’.

⇒ End result: Turn back the button one position.

17 / 66

slide-50
SLIDE 50

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformers: Less than meets the eye

◮ By their very nature, transformer systems are

non-reversible because they lack a target language grammar. If we have a German to English translation system, for example, we are incapable of translating from English to German.

18 / 66

slide-51
SLIDE 51

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Transformers: Less than meets the eye

◮ By their very nature, transformer systems are

non-reversible because they lack a target language grammar. If we have a German to English translation system, for example, we are incapable of translating from English to German.

◮ However, as these systems do not require sophisticated

knowledge of the target language, they are usually very robust = they will return a result for nearly any input sentence.

18 / 66

slide-52
SLIDE 52

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Linguistic knowledge-based systems

◮ Linguistic knowledge-based systems include knowledge

  • f both the source and the target languages.

◮ We will look at direct transfer systems and then the

more specific instance of interlinguas.

◮ Direct transfer systems ◮ Interlinguas 19 / 66

slide-53
SLIDE 53

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems

A direct transfer systems consists of:

◮ A source language grammar

20 / 66

slide-54
SLIDE 54

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems

A direct transfer systems consists of:

◮ A source language grammar ◮ A target language grammar

20 / 66

slide-55
SLIDE 55

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems

A direct transfer systems consists of:

◮ A source language grammar ◮ A target language grammar ◮ Rules relating source language underlying

representation to target language underlying representation

20 / 66

slide-56
SLIDE 56

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems (cont.)

◮ A direct transfer system has a transfer component

which relates a source language representation with a target language representation.

◮ This can also be called a comparative grammar.

21 / 66

slide-57
SLIDE 57

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems (cont.)

◮ A direct transfer system has a transfer component

which relates a source language representation with a target language representation.

◮ This can also be called a comparative grammar. ◮ We’ll walk through the following French to English

example: (7) Londres London plaˆ ıt is pleasing ` a to Sam. Sam

‘Sam likes London.’

21 / 66

slide-58
SLIDE 58

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Direct transfer systems (cont.)

◮ A direct transfer system has a transfer component

which relates a source language representation with a target language representation.

◮ This can also be called a comparative grammar. ◮ We’ll walk through the following French to English

example: (7) Londres London plaˆ ıt is pleasing ` a to Sam. Sam

‘Sam likes London.’

21 / 66

slide-59
SLIDE 59

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR).

22 / 66

slide-60
SLIDE 60

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR). Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR)

22 / 66

slide-61
SLIDE 61

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR). Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR)

  • 2. The transfer component relates this source language

UR (French UR) to a target language UR (English UR).

22 / 66

slide-62
SLIDE 62

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR). Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR)

  • 2. The transfer component relates this source language

UR (French UR) to a target language UR (English UR). French UR English UR X plaire Y

Eng(Y) like Eng(X) (where Eng(X) means the English translation of X) Londres plaire Sam (source UR) → Sam like London (target UR)

22 / 66

slide-63
SLIDE 63

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR). Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR)

  • 2. The transfer component relates this source language

UR (French UR) to a target language UR (English UR). French UR English UR X plaire Y

Eng(Y) like Eng(X) (where Eng(X) means the English translation of X) Londres plaire Sam (source UR) → Sam like London (target UR)

  • 3. target language grammar translates the target language

UR into an actual target language sentence.

22 / 66

slide-64
SLIDE 64

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Steps in a transfer system

  • 1. source language grammar analyzes the input and puts

it into an underlying representation (UR). Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR)

  • 2. The transfer component relates this source language

UR (French UR) to a target language UR (English UR). French UR English UR X plaire Y

Eng(Y) like Eng(X) (where Eng(X) means the English translation of X) Londres plaire Sam (source UR) → Sam like London (target UR)

  • 3. target language grammar translates the target language

UR into an actual target language sentence. Sam like London → Sam likes London.

22 / 66

slide-65
SLIDE 65

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Things to note about transfer systems

◮ The transfer mechanism is essentially reversible; e.g.,

the plaire rule works in both directions (at least in theory)

23 / 66

slide-66
SLIDE 66

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Things to note about transfer systems

◮ The transfer mechanism is essentially reversible; e.g.,

the plaire rule works in both directions (at least in theory)

◮ Because we have a separate target language grammar,

we are able to ensure that the rules of English apply; like → likes.

23 / 66

slide-67
SLIDE 67

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Things to note about transfer systems

◮ The transfer mechanism is essentially reversible; e.g.,

the plaire rule works in both directions (at least in theory)

◮ Because we have a separate target language grammar,

we are able to ensure that the rules of English apply; like → likes.

◮ Word order is handled differently than with

transformers: the URs are essentially unordered.

23 / 66

slide-68
SLIDE 68

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Things to note about transfer systems

◮ The transfer mechanism is essentially reversible; e.g.,

the plaire rule works in both directions (at least in theory)

◮ Because we have a separate target language grammar,

we are able to ensure that the rules of English apply; like → likes.

◮ Word order is handled differently than with

transformers: the URs are essentially unordered.

◮ The underlying representation can be of various levels

  • f abstraction – words, syntactic trees, meaning

representations, etc.; we will talk about this with the translation triangle.

23 / 66

slide-69
SLIDE 69

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Caveat about reversibility

◮ It seems like reversible rules are highly desirable—and

in general they are—but we may not always want reversible rules.

◮ e.g., Dutch aanvangen should be translated into English

as begin, but English begin should be translated into Dutch as beginnen.

24 / 66

slide-70
SLIDE 70

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Levels of abstraction

◮ There are differing levels of abstraction at which transfer

can take place. So far we have looked at URs that represent only word information.

25 / 66

slide-71
SLIDE 71

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Levels of abstraction

◮ There are differing levels of abstraction at which transfer

can take place. So far we have looked at URs that represent only word information.

◮ We can do a full syntactic analysis, which helps us to

know how the words in a sentence relate.

25 / 66

slide-72
SLIDE 72

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Levels of abstraction

◮ There are differing levels of abstraction at which transfer

can take place. So far we have looked at URs that represent only word information.

◮ We can do a full syntactic analysis, which helps us to

know how the words in a sentence relate.

◮ Or we can do only a partial syntactic analysis, such as

representing the dependencies between words.

25 / 66

slide-73
SLIDE 73

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Czech-English example

(8) Kaufman Kaufman & & Broad Broad

  • dm´

ıtla declined institucion´ aln´ ı institutional investory investors jmenovat. to name/identify

‘Kaufman & Broad refused to name the institutional investors.’

Example taken from ˇ Cmejrek, Cuˇ r´ ın, and Havelka (2003).

◮ They find the base forms of words (e.g., obmidout ’to

decline’ instead of odm´ ıtla ’declined’)

◮ They find which words depend on which other words

and represent this in a tree (e.g., the noun investory depends on the verb jmenovat)

◮ This dependency tree is then converted to English

(comparative grammar) and re-ordered as appropriate.

26 / 66

slide-74
SLIDE 74

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Dependency tree for Czech-English example

& Kaufman Broad & name jmenovat Kaufman instituional institucionaini investor investor decline

  • bmitnout

Broad

27 / 66

slide-75
SLIDE 75

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlinguas

◮ Ideally, we could use an interlingua = a

language-independent representation of meaning.

28 / 66

slide-76
SLIDE 76

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlinguas

◮ Ideally, we could use an interlingua = a

language-independent representation of meaning.

◮ Benefit: To add new languages to your MT system, you

merely have to provide mapping rules between your language and the interlingua, and then you can translate into any other language in your system.

28 / 66

slide-77
SLIDE 77

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlinguas

◮ Ideally, we could use an interlingua = a

language-independent representation of meaning.

◮ Benefit: To add new languages to your MT system, you

merely have to provide mapping rules between your language and the interlingua, and then you can translate into any other language in your system.

◮ What your interlingua looks like depends on your goals;

an example for I shot the sheriff. is shown on the following slide.

28 / 66

slide-78
SLIDE 78

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlingua example

                                                                                                                                                                                                  wound  gun  past  maybe                   speaker  first  sg  ?                                                                  sheriff  yes  third  singular  ?  yes  yes - kind of job --- officer                                                                                                                                                                                                                                                

29 / 66

slide-79
SLIDE 79

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlingual problems

◮ What exactly should be represented in the interlingua?

◮ e.g., English corner = Spanish rinc´

  • n = ’inside corner’
  • r esquina = ’outside corner’

30 / 66

slide-80
SLIDE 80

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Interlingual problems

◮ What exactly should be represented in the interlingua?

◮ e.g., English corner = Spanish rinc´

  • n = ’inside corner’
  • r esquina = ’outside corner’

◮ A fine-grained interlingua can require extra

(unnecessary) work:

◮ e.g., Japanese distinguishes older brother from younger

brother, so we have to disambiguate English brother to put it into the interlingua. Then, if we translate into French, we have to ignore the disambiguation and simply translate it as fr` ere, which simply means ’brother’.

30 / 66

slide-81
SLIDE 81

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

The translation triangle

Size of comparative grammar between languages Depth

  • f

Analysis Interlingua Source Target Transfer System

31 / 66

slide-82
SLIDE 82

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Machine learning

◮ Instead of trying to tell the MT system how we’re going

to translate, we might try a machine learning approach = the computer will learn how to translate based on example translations.

32 / 66

slide-83
SLIDE 83

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Machine learning

◮ Instead of trying to tell the MT system how we’re going

to translate, we might try a machine learning approach = the computer will learn how to translate based on example translations.

◮ For this, we need

◮ examples of translations as training data, and ◮ a way of learning from that data. 32 / 66

slide-84
SLIDE 84

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Using frequency (statistical methods)

◮ We can look at how often a source language word is

translated as a target language word, i.e., the frequency of a given translation, and choose the most frequent translation.

33 / 66

slide-85
SLIDE 85

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Using frequency (statistical methods)

◮ We can look at how often a source language word is

translated as a target language word, i.e., the frequency of a given translation, and choose the most frequent translation.

◮ But how can we tell what a word is being translated as?

There are two different cases:

33 / 66

slide-86
SLIDE 86

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Using frequency (statistical methods)

◮ We can look at how often a source language word is

translated as a target language word, i.e., the frequency of a given translation, and choose the most frequent translation.

◮ But how can we tell what a word is being translated as?

There are two different cases:

◮ We are told what each word is translated as: text

alignment

33 / 66

slide-87
SLIDE 87

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Using frequency (statistical methods)

◮ We can look at how often a source language word is

translated as a target language word, i.e., the frequency of a given translation, and choose the most frequent translation.

◮ But how can we tell what a word is being translated as?

There are two different cases:

◮ We are told what each word is translated as: text

alignment

◮ We are not told what each word is translated as: use a

bag of words

33 / 66

slide-88
SLIDE 88

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Text alignment

Sometimes humans have provided informative training data:

◮ sentence alignment ◮ word alignment

34 / 66

slide-89
SLIDE 89

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Text alignment

Sometimes humans have provided informative training data:

◮ sentence alignment ◮ word alignment

34 / 66

slide-90
SLIDE 90

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Text alignment

Sometimes humans have provided informative training data:

◮ sentence alignment ◮ word alignment

The process of text alignment can also be automated and then used to train an MT system.

34 / 66

slide-91
SLIDE 91

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Sentence alignment

◮ sentence alignment = determine which source

language sentences align with which target language

  • nes (what we assumed in the bag of words example).

35 / 66

slide-92
SLIDE 92

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Sentence alignment

◮ sentence alignment = determine which source

language sentences align with which target language

  • nes (what we assumed in the bag of words example).

◮ Intuitively easy, but can be difficult in practice since

different languages have different punctuation conventions.

35 / 66

slide-93
SLIDE 93

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment

◮ word alignment = determine which source language

words align with which target language ones

36 / 66

slide-94
SLIDE 94

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment

◮ word alignment = determine which source language

words align with which target language ones

◮ Much harder than sentence alignment to do

automatically.

36 / 66

slide-95
SLIDE 95

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment

◮ word alignment = determine which source language

words align with which target language ones

◮ Much harder than sentence alignment to do

automatically.

◮ But if it has already been done for us, it gives us good

information about what a word’s translation equivalent is.

36 / 66

slide-96
SLIDE 96

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Different word alignments

◮ One word can map to one word or to multiple words.

Likewise, sometimes it is best for multiple words to align with multiple words.

◮ English-Russian examples:

◮ one-to-one: khorosho = well 37 / 66

slide-97
SLIDE 97

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Different word alignments

◮ One word can map to one word or to multiple words.

Likewise, sometimes it is best for multiple words to align with multiple words.

◮ English-Russian examples:

◮ one-to-one: khorosho = well ◮ one-to-many: kniga = the book 37 / 66

slide-98
SLIDE 98

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Different word alignments

◮ One word can map to one word or to multiple words.

Likewise, sometimes it is best for multiple words to align with multiple words.

◮ English-Russian examples:

◮ one-to-one: khorosho = well ◮ one-to-many: kniga = the book ◮ many-to-one: to take a walk = gulyat’ 37 / 66

slide-99
SLIDE 99

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Different word alignments

◮ One word can map to one word or to multiple words.

Likewise, sometimes it is best for multiple words to align with multiple words.

◮ English-Russian examples:

◮ one-to-one: khorosho = well ◮ one-to-many: kniga = the book ◮ many-to-one: to take a walk = gulyat’ ◮ many-to-many: at least = khotya by (’although if/would’) 37 / 66

slide-100
SLIDE 100

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Calculating probabilities

◮ With word alignments, it is relatively easy to calculate

probabilities.

◮ e.g., What is the probability that run translates as correr

in Spanish?

38 / 66

slide-101
SLIDE 101

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Calculating probabilities

◮ With word alignments, it is relatively easy to calculate

probabilities.

◮ e.g., What is the probability that run translates as correr

in Spanish?

  • 1. Count up how many times run appears in the English

part of your bi-text. e.g., 500 times

38 / 66

slide-102
SLIDE 102

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Calculating probabilities

◮ With word alignments, it is relatively easy to calculate

probabilities.

◮ e.g., What is the probability that run translates as correr

in Spanish?

  • 1. Count up how many times run appears in the English

part of your bi-text. e.g., 500 times

  • 2. Out of all those times, count up how many times it was

translated as (i.e., aligns with) correr. e.g., 275 (out of 500) times.

38 / 66

slide-103
SLIDE 103

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Calculating probabilities

◮ With word alignments, it is relatively easy to calculate

probabilities.

◮ e.g., What is the probability that run translates as correr

in Spanish?

  • 1. Count up how many times run appears in the English

part of your bi-text. e.g., 500 times

  • 2. Out of all those times, count up how many times it was

translated as (i.e., aligns with) correr. e.g., 275 (out of 500) times.

  • 3. Divide to get a probability: 275/500 = 0.55, or 55%

38 / 66

slide-104
SLIDE 104

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties

◮ Knowing how words align in the training data will not tell

us how to handle the new data we see.

39 / 66

slide-105
SLIDE 105

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties

◮ Knowing how words align in the training data will not tell

us how to handle the new data we see.

◮ we may have many cases where fool is aligned with the

Spanish enga˜ nar = ’to fool’

39 / 66

slide-106
SLIDE 106

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties

◮ Knowing how words align in the training data will not tell

us how to handle the new data we see.

◮ we may have many cases where fool is aligned with the

Spanish enga˜ nar = ’to fool’

◮ but we may then encounter a fool, where the translation

should be tonto (male) or tonta (female)

39 / 66

slide-107
SLIDE 107

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties

◮ Knowing how words align in the training data will not tell

us how to handle the new data we see.

◮ we may have many cases where fool is aligned with the

Spanish enga˜ nar = ’to fool’

◮ but we may then encounter a fool, where the translation

should be tonto (male) or tonta (female)

◮ So, word alignment only helps us get some frequency

numbers; we still have to do something intelligent with them.

39 / 66

slide-108
SLIDE 108

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties (cont.)

◮ Sometimes it is not even clear that word alignment is

possible. (9) Ivan Ivan aspirant. graduate student

‘Ivan is a graduate student.’

◮ What does is align with?

40 / 66

slide-109
SLIDE 109

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Word alignment difficulties (cont.)

◮ Sometimes it is not even clear that word alignment is

possible. (9) Ivan Ivan aspirant. graduate student

‘Ivan is a graduate student.’

◮ What does is align with? ◮ In cases like this, a word can be mapped to a “null”

element in the other language.

40 / 66

slide-110
SLIDE 110

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

The “bag of words” method

◮ What if we’re not given word alignments? ◮ How can we tell which English words are translated as

which German words if we are only given an English text and a corresponding German text?

41 / 66

slide-111
SLIDE 111

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

The “bag of words” method

◮ What if we’re not given word alignments? ◮ How can we tell which English words are translated as

which German words if we are only given an English text and a corresponding German text?

◮ We can treat each sentence as a bag of words =

unordered collection of words.

41 / 66

slide-112
SLIDE 112

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

The “bag of words” method

◮ What if we’re not given word alignments? ◮ How can we tell which English words are translated as

which German words if we are only given an English text and a corresponding German text?

◮ We can treat each sentence as a bag of words =

unordered collection of words.

◮ If word A appears in a sentence, then we will record all

  • f the words in the corresponding sentence in the other

language as appearing with it.

41 / 66

slide-113
SLIDE 113

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

◮ English He speaks Russian well. ◮ Russian On khorosho govorit po-russki.

Eng Rus Eng Rus He On speaks On He khorosho speaks khorosho He govorit . . . . . . He po-russki well po-russki

42 / 66

slide-114
SLIDE 114

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

◮ English He speaks Russian well. ◮ Russian On khorosho govorit po-russki.

Eng Rus Eng Rus He On speaks On He khorosho speaks khorosho He govorit . . . . . . He po-russki well po-russki The idea is that, over thousands, or even millions, of sentences, He will tend to appear more often with On, speaks will appear with govorit, and so on.

42 / 66

slide-115
SLIDE 115

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 1

So, for He in He speaks Russian well/On khorosho govorit po-russki, we do the following:

  • 1. Count up the number of Russian words: 4.

43 / 66

slide-116
SLIDE 116

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 1

So, for He in He speaks Russian well/On khorosho govorit po-russki, we do the following:

  • 1. Count up the number of Russian words: 4.
  • 2. Assign each word equal probability of translation: 1/4 =

0/25, or 25%.

43 / 66

slide-117
SLIDE 117

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 2

If we also have He is nice./On simpatich’nyi., then for He, we do the following:

44 / 66

slide-118
SLIDE 118

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 2

If we also have He is nice./On simpatich’nyi., then for He, we do the following:

  • 1. Count up the number of possible translation words: 4

from the first sentence, 2 from the second = 6 total.

44 / 66

slide-119
SLIDE 119

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 2

If we also have He is nice./On simpatich’nyi., then for He, we do the following:

  • 1. Count up the number of possible translation words: 4

from the first sentence, 2 from the second = 6 total.

  • 2. Count up the number of times On is the translation = 2

times out of 6 = 1/3 = 0.33, or 33%.

44 / 66

slide-120
SLIDE 120

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Example for bag of words method

Calculating probabilities: sentence 2

If we also have He is nice./On simpatich’nyi., then for He, we do the following:

  • 1. Count up the number of possible translation words: 4

from the first sentence, 2 from the second = 6 total.

  • 2. Count up the number of times On is the translation = 2

times out of 6 = 1/3 = 0.33, or 33%. Every other word has the probability 1/6 = 0.17, or 17%, so On is clearly the best translation for He.

44 / 66

slide-121
SLIDE 121

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What makes MT hard?

We’ve seen how MT systems can work, but MT is a very difficult task because languages are vastly different.

45 / 66

slide-122
SLIDE 122

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

What makes MT hard?

We’ve seen how MT systems can work, but MT is a very difficult task because languages are vastly different.They differ:

◮ Lexically: In the words they use ◮ Syntactically: In the constructions they allow ◮ Semantically: In the way meanings work ◮ Pragmatically: In what readers take from a sentence.

In addition, there is a good deal of real-world knowledge that goes into a translation.

45 / 66

slide-123
SLIDE 123

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical ambiguity

Words can be lexically ambiguous = have multiple meanings.

46 / 66

slide-124
SLIDE 124

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical ambiguity

Words can be lexically ambiguous = have multiple meanings.

◮ bank can be a financial institution or a place along a

river.

46 / 66

slide-125
SLIDE 125

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical ambiguity

Words can be lexically ambiguous = have multiple meanings.

◮ bank can be a financial institution or a place along a

river.

◮ can can be a cylindrical object, as well as the act of

putting something into that cylinder (e.g., John cans tuna.), as well as being a word like must, might, or should.

46 / 66

slide-126
SLIDE 126

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical ambiguity

Words can be lexically ambiguous = have multiple meanings.

◮ bank can be a financial institution or a place along a

river.

◮ can can be a cylindrical object, as well as the act of

putting something into that cylinder (e.g., John cans tuna.), as well as being a word like must, might, or should.

⇒ We have to know which meaning before we translate.

46 / 66

slide-127
SLIDE 127

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How words divide up the world (lexical issues)

Words don’t line up exactly between languages. Within a language, we have synonyms, hyponyms, and hypernyms.

47 / 66

slide-128
SLIDE 128

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How words divide up the world (lexical issues)

Words don’t line up exactly between languages. Within a language, we have synonyms, hyponyms, and hypernyms.

◮ sofa and couch are synonyms (mean the same thing)

47 / 66

slide-129
SLIDE 129

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How words divide up the world (lexical issues)

Words don’t line up exactly between languages. Within a language, we have synonyms, hyponyms, and hypernyms.

◮ sofa and couch are synonyms (mean the same thing) ◮ sofa is a hyponym (more specific term) of furniture

47 / 66

slide-130
SLIDE 130

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How words divide up the world (lexical issues)

Words don’t line up exactly between languages. Within a language, we have synonyms, hyponyms, and hypernyms.

◮ sofa and couch are synonyms (mean the same thing) ◮ sofa is a hyponym (more specific term) of furniture ◮ furniture is a hypernym (more general term) of sofa

47 / 66

slide-131
SLIDE 131

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Synonyms

Often we find synonyms between two languages (as much as there are synonyms within a language):

◮ English book = Russian kniga ◮ English music = Spanish m´

usica

48 / 66

slide-132
SLIDE 132

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Synonyms

Often we find synonyms between two languages (as much as there are synonyms within a language):

◮ English book = Russian kniga ◮ English music = Spanish m´

usica But words don’t always line up exactly between languages.

48 / 66

slide-133
SLIDE 133

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

49 / 66

slide-134
SLIDE 134

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

◮ English know is rendered by the French savoir (’to know

a fact’) and connaitre (’to know a thing’)

49 / 66

slide-135
SLIDE 135

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

◮ English know is rendered by the French savoir (’to know

a fact’) and connaitre (’to know a thing’)

◮ English library is German B¨

ucherei if it is open to the public, but Bibliothek if it is intended for scholarly work.

49 / 66

slide-136
SLIDE 136

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

◮ English know is rendered by the French savoir (’to know

a fact’) and connaitre (’to know a thing’)

◮ English library is German B¨

ucherei if it is open to the public, but Bibliothek if it is intended for scholarly work.

◮ English hyponyms = words that are more specific in

English than in their foreign language counterparts.

49 / 66

slide-137
SLIDE 137

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

◮ English know is rendered by the French savoir (’to know

a fact’) and connaitre (’to know a thing’)

◮ English library is German B¨

ucherei if it is open to the public, but Bibliothek if it is intended for scholarly work.

◮ English hyponyms = words that are more specific in

English than in their foreign language counterparts.

◮ The German word berg can mean either hill or

mountain in English.

49 / 66

slide-138
SLIDE 138

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Hypernyms and Hyponyms

◮ English hypernyms = words that are more general in

English than in their counterparts in other languages

◮ English know is rendered by the French savoir (’to know

a fact’) and connaitre (’to know a thing’)

◮ English library is German B¨

ucherei if it is open to the public, but Bibliothek if it is intended for scholarly work.

◮ English hyponyms = words that are more specific in

English than in their foreign language counterparts.

◮ The German word berg can mean either hill or

mountain in English.

◮ The Russian word ruka can mean either hand or arm. 49 / 66

slide-139
SLIDE 139

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Semantic overlap

And then there’s just fuzziness, as in the following English and French correspondences

◮ leg = etape (journey), jambe (human), pied (chair),

patte (animal)

◮ foot = pied (human), patte (bird) ◮ paw = patte (animal)

50 / 66

slide-140
SLIDE 140

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Venn diagram of semantic overlap

animal chair human bird animal human journey

paw foot leg jambe pied patte etape

51 / 66

slide-141
SLIDE 141

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical gaps

Sometimes there is no simple equivalent for a word in a language, and the word has to be translated with a more complex phrase. We call this a lexical gap or lexical hole.

52 / 66

slide-142
SLIDE 142

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical gaps

Sometimes there is no simple equivalent for a word in a language, and the word has to be translated with a more complex phrase. We call this a lexical gap or lexical hole.

◮ French gratiner means something like ’to cook with a

cheese coating’

52 / 66

slide-143
SLIDE 143

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Lexical gaps

Sometimes there is no simple equivalent for a word in a language, and the word has to be translated with a more complex phrase. We call this a lexical gap or lexical hole.

◮ French gratiner means something like ’to cook with a

cheese coating’

◮ Hebrew stam means something like ’I’m just kidding’ or

’Nothing special.’

52 / 66

slide-144
SLIDE 144

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Light verbs

Some verbs carry little meaning, so-called light verbs

53 / 66

slide-145
SLIDE 145

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Light verbs

Some verbs carry little meaning, so-called light verbs

◮ French faire une promenade is literally ’make a walk,’

but it has the meaning of the English take a walk

53 / 66

slide-146
SLIDE 146

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Light verbs

Some verbs carry little meaning, so-called light verbs

◮ French faire une promenade is literally ’make a walk,’

but it has the meaning of the English take a walk

◮ Dutch een poging doen ’do an attempt’ means the same

as the English make an attempt

53 / 66

slide-147
SLIDE 147

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Idioms

And we often face idioms = expressions whose meaning is not made up of the meanings of the individual words.

◮ e.g., English kick the bucket

54 / 66

slide-148
SLIDE 148

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Idioms

And we often face idioms = expressions whose meaning is not made up of the meanings of the individual words.

◮ e.g., English kick the bucket

◮ approximately equivalent to the French casser sa pipe

(’break his/her pipe’)

54 / 66

slide-149
SLIDE 149

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Idioms

And we often face idioms = expressions whose meaning is not made up of the meanings of the individual words.

◮ e.g., English kick the bucket

◮ approximately equivalent to the French casser sa pipe

(’break his/her pipe’)

◮ but we might want to translate it as mourir (’die’) 54 / 66

slide-150
SLIDE 150

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Idioms

And we often face idioms = expressions whose meaning is not made up of the meanings of the individual words.

◮ e.g., English kick the bucket

◮ approximately equivalent to the French casser sa pipe

(’break his/her pipe’)

◮ but we might want to translate it as mourir (’die’) ◮ and we want to treat it differently than kick the table 54 / 66

slide-151
SLIDE 151

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Idiosyncracies

There are idiosyncratic choices among languages, e.g.:

◮ English heavy smoker ◮ French grand fumeur (’large smoker’) ◮ German starker Raucher (’strong smoker’)

55 / 66

slide-152
SLIDE 152

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Taboo words

There are taboo words = words which are “forbidden” in some way or in some circumstances (i.e., swear/curse words)

56 / 66

slide-153
SLIDE 153

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Taboo words

There are taboo words = words which are “forbidden” in some way or in some circumstances (i.e., swear/curse words)

◮ You of course know several English examples. Note

that the literal meanings of these words lack the emotive impact of the actual words.

56 / 66

slide-154
SLIDE 154

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Taboo words

There are taboo words = words which are “forbidden” in some way or in some circumstances (i.e., swear/curse words)

◮ You of course know several English examples. Note

that the literal meanings of these words lack the emotive impact of the actual words.

◮ Other languages/cultures have different taboos: often

revolving around death, body parts, bodily functions, disease, and religion.

56 / 66

slide-155
SLIDE 155

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Taboo words

There are taboo words = words which are “forbidden” in some way or in some circumstances (i.e., swear/curse words)

◮ You of course know several English examples. Note

that the literal meanings of these words lack the emotive impact of the actual words.

◮ Other languages/cultures have different taboos: often

revolving around death, body parts, bodily functions, disease, and religion.

◮ e.g., The word ’skin’ is taboo in a Western Australian

(Aboriginal) language (http://www.aija.org.au/online/ ICABenchbook/BenchbookChapter5.pdf)

56 / 66

slide-156
SLIDE 156

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Taboo words

There are taboo words = words which are “forbidden” in some way or in some circumstances (i.e., swear/curse words)

◮ You of course know several English examples. Note

that the literal meanings of these words lack the emotive impact of the actual words.

◮ Other languages/cultures have different taboos: often

revolving around death, body parts, bodily functions, disease, and religion.

◮ e.g., The word ’skin’ is taboo in a Western Australian

(Aboriginal) language (http://www.aija.org.au/online/ ICABenchbook/BenchbookChapter5.pdf)

◮ Imagine encountering the word ’skin’ in English and

translating it without knowing this.

56 / 66

slide-157
SLIDE 157

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Structure and word order differences

◮ Word order (and syntactic structure) differs across

languages.

◮ E.g., in English, we have what is called a

subject-verb-object (SVO) order, as in (10). (10) John  punched  Bill. 

57 / 66

slide-158
SLIDE 158

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Structure and word order differences

◮ Word order (and syntactic structure) differs across

languages.

◮ E.g., in English, we have what is called a

subject-verb-object (SVO) order, as in (10). (10) John  punched  Bill. 

◮ In contrast, Japanese is SOV. Arabic is VSO. Dyirbal

(Australian aboriginal language) has free word order.

57 / 66

slide-159
SLIDE 159

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Structure and word order differences

◮ Word order (and syntactic structure) differs across

languages.

◮ E.g., in English, we have what is called a

subject-verb-object (SVO) order, as in (10). (10) John  punched  Bill. 

◮ In contrast, Japanese is SOV. Arabic is VSO. Dyirbal

(Australian aboriginal language) has free word order.

◮ MT systems have to account for these differences.

57 / 66

slide-160
SLIDE 160

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

More on word order differences

◮ Sometimes things are conceptualized differently in

different languages, e.g.: (11) a. His name is Jerome.

  • b. Er

He heißt goes-by-name-of Jerome. Jerome (German)

  • c. Il

He s’ himself appelle call Jerome. Jerome. (French)

◮ Words don’t really align here.

58 / 66

slide-161
SLIDE 161

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How syntactic grouping and meaning relate (Syntax/Semantics)

Even within a language, there are syntactic complications. We can have structural ambiguities = sentences where there are multiple ways of interpreting it. (12) John saw the boy (with the binoculars). with the binoculars can refer to either the boy or to how John saw the boy.

59 / 66

slide-162
SLIDE 162

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How syntactic grouping and meaning relate (Syntax/Semantics)

Even within a language, there are syntactic complications. We can have structural ambiguities = sentences where there are multiple ways of interpreting it. (12) John saw the boy (with the binoculars). with the binoculars can refer to either the boy or to how John saw the boy.

◮ This difference in structure corresponds to a difference

in what we think the sentence means, i.e., meaning is derived from the words and how they are grouped.

◮ Do we attempt to translate only one interpretation? Or

do we try to preserve the ambiguity in the target language?

59 / 66

slide-163
SLIDE 163

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How language is used (Pragmatics)

Translation becomes even more difficult when we try to translate something in context.

◮ Thank you is usually translated as merci in French, but

it is translated as s’il vous plaˆ ıt ’please’ when responding to an offer.

60 / 66

slide-164
SLIDE 164

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

How language is used (Pragmatics)

Translation becomes even more difficult when we try to translate something in context.

◮ Thank you is usually translated as merci in French, but

it is translated as s’il vous plaˆ ıt ’please’ when responding to an offer.

◮ Can you drive a stick-shift? could be a request for you

to drive my manual transmission automobile, or it could simply be a request for information about your driving abilities.

60 / 66

slide-165
SLIDE 165

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Real-world knowledge

◮ Sometimes we have to use real-world knowledge to

figure out what a sentence means. (13) Put the paper in the printer. Then switch it on.

61 / 66

slide-166
SLIDE 166

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Real-world knowledge

◮ Sometimes we have to use real-world knowledge to

figure out what a sentence means. (13) Put the paper in the printer. Then switch it on.

◮ We know what it refers to only because we know that

printers, not paper, can be switched on.

61 / 66

slide-167
SLIDE 167

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Ambiguity resolution

◮ If the source language involves ambiguous

words/phrases, but the target language does not have the same ambiguity, we have to resolve ambiguity before translation. e.g., the hyponyms/hypernyms we saw before.

62 / 66

slide-168
SLIDE 168

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Ambiguity resolution

◮ If the source language involves ambiguous

words/phrases, but the target language does not have the same ambiguity, we have to resolve ambiguity before translation. e.g., the hyponyms/hypernyms we saw before.

◮ But sometimes we might want to preserve the

ambiguity, or note that there was ambiguity or that there are a whole range of meanings available.

62 / 66

slide-169
SLIDE 169

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Ambiguity resolution

◮ If the source language involves ambiguous

words/phrases, but the target language does not have the same ambiguity, we have to resolve ambiguity before translation. e.g., the hyponyms/hypernyms we saw before.

◮ But sometimes we might want to preserve the

ambiguity, or note that there was ambiguity or that there are a whole range of meanings available.

⇒ In the Bible, the Greek word hyper is used in 1

Corinthians 15:29; it can mean ’over’, ’for’, ’on behalf

  • f’, and so on. How you treat it affects how you treat the

theological issue of salvation of the already dead.

62 / 66

slide-170
SLIDE 170

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Ambiguity resolution

◮ If the source language involves ambiguous

words/phrases, but the target language does not have the same ambiguity, we have to resolve ambiguity before translation. e.g., the hyponyms/hypernyms we saw before.

◮ But sometimes we might want to preserve the

ambiguity, or note that there was ambiguity or that there are a whole range of meanings available.

⇒ In the Bible, the Greek word hyper is used in 1

Corinthians 15:29; it can mean ’over’, ’for’, ’on behalf

  • f’, and so on. How you treat it affects how you treat the

theological issue of salvation of the already dead. i.e., people care deeply about how you translate this word, yet it is not entirely clear what English meaning it has.

62 / 66

slide-171
SLIDE 171

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating MT systems

◮ We’ve seen some translation systems and we know that

translation is hard.

◮ The question now is: How do we evaluate MT systems,

in particular for use in large corporations as likely users?

◮ How much change in the current setup will the MT

system force?

63 / 66

slide-172
SLIDE 172

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating MT systems

◮ We’ve seen some translation systems and we know that

translation is hard.

◮ The question now is: How do we evaluate MT systems,

in particular for use in large corporations as likely users?

◮ How much change in the current setup will the MT

system force?

◮ How will it fit in with word processors and other

software?

63 / 66

slide-173
SLIDE 173

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating MT systems

◮ We’ve seen some translation systems and we know that

translation is hard.

◮ The question now is: How do we evaluate MT systems,

in particular for use in large corporations as likely users?

◮ How much change in the current setup will the MT

system force?

◮ How will it fit in with word processors and other

software?

◮ Will the company selling the MT system be around in

the next few years for support and updates?

63 / 66

slide-174
SLIDE 174

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating MT systems

◮ We’ve seen some translation systems and we know that

translation is hard.

◮ The question now is: How do we evaluate MT systems,

in particular for use in large corporations as likely users?

◮ How much change in the current setup will the MT

system force?

◮ How will it fit in with word processors and other

software?

◮ Will the company selling the MT system be around in

the next few years for support and updates?

◮ How fast is the MT system? 63 / 66

slide-175
SLIDE 175

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating MT systems

◮ We’ve seen some translation systems and we know that

translation is hard.

◮ The question now is: How do we evaluate MT systems,

in particular for use in large corporations as likely users?

◮ How much change in the current setup will the MT

system force?

◮ How will it fit in with word processors and other

software?

◮ Will the company selling the MT system be around in

the next few years for support and updates?

◮ How fast is the MT system? ◮ How good is the MT system (quality)? 63 / 66

slide-176
SLIDE 176

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating quality

◮ Intelligibility = how understandable the output is

64 / 66

slide-177
SLIDE 177

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating quality

◮ Intelligibility = how understandable the output is ◮ Accuracy = how faithful the output is to the input

64 / 66

slide-178
SLIDE 178

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating quality

◮ Intelligibility = how understandable the output is ◮ Accuracy = how faithful the output is to the input ◮ Error analysis = how many errors we have to sort

through (and how do the errors affect intelligibility & accuracy)

64 / 66

slide-179
SLIDE 179

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Evaluating quality

◮ Intelligibility = how understandable the output is ◮ Accuracy = how faithful the output is to the input ◮ Error analysis = how many errors we have to sort

through (and how do the errors affect intelligibility & accuracy)

◮ Test suite = a set of sentences that our system should

be able to handle

64 / 66

slide-180
SLIDE 180

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Intelligibility

Intelligibility Scale (from Arnold et al., 1994)

  • 1. The sentence is perfectly clear and intelligible. It is

grammatical and reads like ordinary text.

65 / 66

slide-181
SLIDE 181

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Intelligibility

Intelligibility Scale (from Arnold et al., 1994)

  • 1. The sentence is perfectly clear and intelligible. It is

grammatical and reads like ordinary text.

  • 2. The sentence is generally clear and intelligible. Despite

some inaccuracies or infelicities of the sentence, one can understand (almost) immediately what it means.

65 / 66

slide-182
SLIDE 182

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Intelligibility

Intelligibility Scale (from Arnold et al., 1994)

  • 1. The sentence is perfectly clear and intelligible. It is

grammatical and reads like ordinary text.

  • 2. The sentence is generally clear and intelligible. Despite

some inaccuracies or infelicities of the sentence, one can understand (almost) immediately what it means.

  • 3. The general idea of the sentence is intelligible only after

considerable study. The sentence contains grammatical errors and/or poor word choices.

65 / 66

slide-183
SLIDE 183

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Intelligibility

Intelligibility Scale (from Arnold et al., 1994)

  • 1. The sentence is perfectly clear and intelligible. It is

grammatical and reads like ordinary text.

  • 2. The sentence is generally clear and intelligible. Despite

some inaccuracies or infelicities of the sentence, one can understand (almost) immediately what it means.

  • 3. The general idea of the sentence is intelligible only after

considerable study. The sentence contains grammatical errors and/or poor word choices.

  • 4. The sentence is unintelligible. Studying the meaning of

the sentence is hopeless; even allowing for context, one feels that guessing would be too unreliable.

65 / 66

slide-184
SLIDE 184

Language and Computers Topic 5: Machine Translation Introduction

Examples for Translations

Background: Dictionaries Transformer approaches Linguistic knowledge based systems

Direct transfer systems Interlingua-based systems

Machine learning based systems

Alignment

What makes MT hard? Evaluating MT systems References

Further reading

Some of the examples are adapted from the following books:

◮ Doug J. Arnold, Lorna Balkan, Siety Meijer, R. Lee

Humphreys and Louisa Sadler (1994). Machine Translation: an Introductory Guide. Blackwells-NCC,

  • London. 1994. Available from

http://www.essex.ac.uk/linguistics/clmt/MTbook/

◮ Jurafsky, Daniel, and James H. Martin (2000). Speech

and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall. More info at http://www.cs.colorado.edu/∼martin/slp.html.

66 / 66