Using word alignments to assist computer-aided translation users by - - PowerPoint PPT Presentation

using word alignments to assist computer aided
SMART_READER_LITE
LIVE PREVIEW

Using word alignments to assist computer-aided translation users by - - PowerPoint PPT Presentation

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited Miquel Espl`


slide-1
SLIDE 1

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited

Miquel Espl` a-Gomis Felipe S´ anchez-Mart´ ınez Mikel L. Forcada {mespla,fsanchez,mlf}@dlsi.ua.es

Departament de Llenguatges i Sistemes Inform` atics Universitat d’Alacant, E-03071 Alacant, Spain

15th Annual Conference of the EAMT Leuven, May 30, 2011

slide-2
SLIDE 2

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-3
SLIDE 3

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-4
SLIDE 4

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Translation Memories

English Catalan s1: European Association for Machine Translation t1: Associaci´

  • Europea per a la

Traducci´

  • Autom`

atica s2: The EAMT is a member of the IAMT t2: L ’EAMT ´ es membre de l’IAMT s3: current year’s conference is held in Leuven t3: el congr´ es d’enguany se cel- ebra a Lovaina . . . . . .

slide-5
SLIDE 5

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Translation Memories

English Catalan s1: European Association for Machine Translation t1: Associaci´

  • Europea per a la

Traducci´

  • Autom`

atica s2: The EAMT is a member of the IAMT t2: L ’EAMT ´ es membre de l’IAMT s3: current year’s conference is held in Leuven t3: el congr´ es d’enguany se cel- ebra a Lovaina . . . . . .

New sentence s′: The AMTA is a member of the IAMT

slide-6
SLIDE 6

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Translation Memories

English Catalan s1: European Association for Machine Translation t1: Associaci´

  • Europea per a la

Traducci´

  • Autom`

atica s2: The EAMT is a member of the IAMT t2: L ’EAMT ´ es membre de l’IAMT s3: current year’s conference is held in Leuven t3: el congr´ es d’enguany se cel- ebra a Lovaina . . . . . .

New sentence s′: The AMTA is a member of the IAMT Best match s2: The EAMT is a member of the IAMT

slide-7
SLIDE 7

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Translation Memories

English Catalan s1: European Association for Machine Translation t1: Associaci´

  • Europea per a la

Traducci´

  • Autom`

atica s2: The EAMT is a member of the IAMT t2: L ’EAMT ´ es membre de l’IAMT s3: current year’s conference is held in Leuven t3: el congr´ es d’enguany se cel- ebra a Lovaina . . . . . .

New sentence s′: The AMTA is a member of the IAMT Best match s2: The EAMT is a member of the IAMT Proposal t2: L ’EAMT ´ es membre de l’IAMT

slide-8
SLIDE 8

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Fuzzy Matching Scores

Fuzzy matching scores measure the similarity between segments s′ (segment to be translated) and si (matching segment in the Translation memory) score(s′, si) = 1 − EditDistance(s′, si) max(|s′|, |si|)

slide-9
SLIDE 9

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Fuzzy Matching Scores

Fuzzy matching scores measure the similarity between segments s′ (segment to be translated) and si (matching segment in the Translation memory) score(s′, si) = 1 − EditDistance(s′, si) max(|s′|, |si|) Example

s′: The Association for Machine Translation in the Americas is the American branch of the IAMT si: The European Association for Machine Translation is a member of the IAMT score(s′, si) = 1 − 7 15 ≃ 0, 53

slide-10
SLIDE 10

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Translation-Memory Based CAT Tools

slide-11
SLIDE 11

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Fuzzy Match Scores + Alignment

Edit distance provides information about the matching words between s′ and si: Example

the Asia-Pacific Association for Machine Translation the European Association for Machine Translation l’ Associaci´

  • Europea per a la Traducci´
  • Autom`

atica

si ti s′

slide-12
SLIDE 12

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Fuzzy Match Scores + Alignment

Word alignment may be used to “project” source-side matching information onto ti to suggest which words to change and which to keep unedited: Example

the Asia-Pacific Association for Machine Translation the European Association for Machine Translation l’ Associaci´

  • Europea per a la Traducci´
  • Autom`

atica

si ti s′

❅ ❅ ❅ ❅ ❅ ❆ ❆ ❆ ❆ ❆ ❉ ❉ ❉ ❉ ❉

❆ ❆ ❆ ❆

slide-13
SLIDE 13

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-14
SLIDE 14

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Related Work

Simard (2003): Statistical MT techniques allows exploiting TMs at sub-segment (sub-sentential) level: translation spotting Bourdaillet et al. (2009): Similar approach for a bilingual concordancer, TransSearch Kranias and Samiotou (2004): Sub-segment level alignments using a bilingual dictionary to (i) detect words to be changed and (ii) propose translations for them

slide-15
SLIDE 15

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-16
SLIDE 16

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Rationale

si ti vik wij

[keep]

vik′ wij′

[change]

wij′′ ?

[?]

matched with s′ unmatched with s′ matched with s′

  • wij and vik aligned and vik matched =

⇒ keep wij wij and vik aligned and vik not matched = ⇒ change wij wij not aligned = ⇒ ???

slide-17
SLIDE 17

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Rationale

What to do if there is more than one alignment with contradictory evidence? si ti wij

[???]

matched with s′ unmatched with s′

✁ ✁ ✁ ✁ ✁

vik vik′

❆ ❆ ❆ ❆ ❆

slide-18
SLIDE 18

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Rationale

We define the likelihood of keeping the word wij unedited as: fK(wij, s′, si, ti) =

  • vik∈aligned(wij) matched(vik)

|aligned(wij)| aligned(wij): set of source-side words aligned with wij in si matched(vik): 1 if vik is matched in s′ and 0 otherwise

slide-19
SLIDE 19

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Interpretation of fK(wij, s′, si, ti)

Two ways to interpret fK(wij, s′, si, ti): Unanimity:

if fK(wij, s′, si, ti) = 1: wij →keep unedited if fK(wij, s′, si, ti) = 0: wij → change

  • therwise → not marked

Majority:

if fK(wij, s′, si, ti) > 1

2: wij → keep unedited

if fK(wij, s′, si, ti) < 1

2: wij → change

  • therwise → not marked
slide-20
SLIDE 20

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Example of Unanimity Criterion

[change] [?] [keep] [keep]

ti: he missed

✁ ✁ ✁ ❏ ❏ ❏

his brother si: ´ el ech´

  • de menos a su hermano

s′: ella ech´

  • de casa

a su hermano

slide-21
SLIDE 21

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Example of Majority Criterion

[change] [keep] [keep] [keep]

ti: he missed

✁ ✁ ✁ ❏ ❏ ❏

his brother si: ´ el ech´

  • de menos a su hermano

s′: ella ech´

  • de casa

a su hermano

slide-22
SLIDE 22

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-23
SLIDE 23

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Corpora

slide-24
SLIDE 24

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Evaluation Metrics

Accuracy = correctly marked words marked words Coverage = marked words total words

slide-25
SLIDE 25

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Statistical Word Alignment

We use the GIZA++ (Och and Ney, 2003) free/open-source tool we obtain SL to TL alignment and a TL to SL alignment on the TM we experiment with three ways to combine the alignments:

union intersection grow-diag-final-and

slide-26
SLIDE 26

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++

slide-27
SLIDE 27

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++ the use of both criteria defined to use the likelihood fK (unanimity or majority)

slide-28
SLIDE 28

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++ the use of both criteria defined to use the likelihood fK (unanimity or majority) the use of alignment models trained on:

slide-29
SLIDE 29

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++ the use of both criteria defined to use the likelihood fK (unanimity or majority) the use of alignment models trained on:

the corpus to be aligned itself

slide-30
SLIDE 30

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++ the use of both criteria defined to use the likelihood fK (unanimity or majority) the use of alignment models trained on:

the corpus to be aligned itself a separate in-domain corpus

slide-31
SLIDE 31

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Experimental Settings

We tried our approach comparing: the use of three different methods to combine the alignments generated with GIZA++ the use of both criteria defined to use the likelihood fK (unanimity or majority) the use of alignment models trained on:

the corpus to be aligned itself a separate in-domain corpus a separate out-of-domain corpus

slide-32
SLIDE 32

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Corpora

slide-33
SLIDE 33

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Corpora

slide-34
SLIDE 34

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Corpora

slide-35
SLIDE 35

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Results for the Majority/Unanimity Criteria

85 90 95 100 50 60 70 80 90 Accuracy (%) Fuzzy-Matching Threshold (%) Majority union Unanimity union 85 90 95 100 50 60 70 80 90 Coverage (%) Fuzzy-Matching Threshold (%) Majority union Unanimity union

slide-36
SLIDE 36

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Results for the Different Alignment Models

85 90 95 100 50 60 70 80 90 Accuracy (%) Fuzzy-Matching Threshold (%) self union in-domain union

  • ut-of-domain union

85 90 95 100 50 60 70 80 90 Coverage (%) Fuzzy-Matching Threshold (%) self union in-domain union

  • ut-of-domain union
slide-37
SLIDE 37

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-38
SLIDE 38

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Concluding Remarks

new method to improve TM-based CAT tools predictability and high confidence of translators on fuzzy-match scores is kept accuracy over 94% for fuzzy match thresholds between 60% and 90% it is possible to reuse statistical alignment models from different corpora with a small loss in accuracy (but a larger loss in coverage)

slide-39
SLIDE 39

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Outline

1

Introduction

2

Related Work

3

Methodology

4

Experiments and Results

5

Conclusion

6

Current and future Work

slide-40
SLIDE 40

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

Current and future Work

Current: surveying translators about the usefulness of target-side colouring (visit survey at http://transducens.dlsi. ua.es/people/fsanchez/survey.html) using MT to inform aligners and classifiers to colour target words in proposals on the fly (no need to train the aligner

  • n a corpus)

Future: integration in the OmegaT free/open-source CAT system

slide-41
SLIDE 41

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work

License

HEEL ERG BEDANKT! MOLTES GR ` ACIES!

Acknowledgements: Partially funded by the Spanish government through project TIN2009-14009-C02-01. Good ideas from Yanjun Ma, Andy Way and Harold Somers: thanks! License: This work may be distributed under the terms of the Creative Commons Attribution–Share Alike license:

http://creativecommons.org/licenses/by-sa/3.0/

the GNU GPL v. 3.0 License:

http://www.gnu.org/licenses/gpl.html

Dual license! E-mail me to get the sources: mespla@dlsi.ua.es