Comparative human and automatic evaluation of glass-box and - - PowerPoint PPT Presentation

comparative human and automatic evaluation of glass box
SMART_READER_LITE
LIVE PREVIEW

Comparative human and automatic evaluation of glass-box and - - PowerPoint PPT Presentation

Comparative human and automatic evaluation of glass-box and black-box approaches to interactive translation prediction Daniel Torregrosa , Juan Antonio P erez-Ortiz, Mikel L. Forcada Departament de Llenguatges i Sistemes Inform` atics


slide-1
SLIDE 1

Comparative human and automatic evaluation of glass-box and black-box approaches to interactive translation prediction

Daniel Torregrosa, Juan Antonio P´ erez-Ortiz, Mikel L. Forcada

Departament de Llenguatges i Sistemes Inform` atics Universitat d’Alacant, Spain

EAMT2017

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 1 / 37

slide-2
SLIDE 2

Outline

1

Introduction

2

Automatic evaluation

3

Human evaluation

4

Summary

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 2 / 37

slide-3
SLIDE 3

Abstract

Interactive translation prediction (ITP) offers suggestions as the translation is being written by the translator We compare black-box and glass-box ITP for the first time Translators can potentially save 20–50% keystrokes using ITP All software used for the comparison is free/open-source

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 3 / 37

slide-4
SLIDE 4

Outline

1

Introduction Translation technologies Glass-box interactive translation prediction Black-box interactive translation prediction

2

Automatic evaluation

3

Human evaluation

4

Summary

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 4 / 37

slide-5
SLIDE 5

Translation technologies

Professional translators often use translation technologies such as

◮ dictionaries ◮ bilingual concordancers (Context Reverso, Linguee) ◮ translation memories ◮ machine translation ◮ post-editing ◮ interactive translation prediction

to achieve better, faster translations

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 5 / 37

slide-6
SLIDE 6

Computer assisted translation

Interactive translation prediction

Interactive translation prediction (ITP) focuses on offering suggestions as the translator types the translation The approaches in the literature use a glass-box approach, where the inner workings of a SMT system are queried to provide ITP We have proposed a black-box approach1 that can use any bilingual resource to provide ITP

1Torregrosa Rivero, Daniel, Mikel L. Forcada, and Juan Antonio P´

erez-Ortiz. “An Open-Source Web-Based Tool for Resource-Agnostic Interactive Translation Prediction.” (2014).

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 6 / 37

slide-7
SLIDE 7

Glass-box ITP

Glass-box ITP typically uses a modified or tailor-made SMT system that is also able to provide additional information, such as word alignments, alternative translations and translation probabilities Recently, neural MT has been used to provide ITP

◮ Unlike with SMT, access to the internals is not needed ◮ The decoding process is modified so it can accept a prefix Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 7 / 37

slide-8
SLIDE 8

Glass-box ITP

Example

Source sentence

er geht ja nicht nach hause

Target translation

he does not go home In this example, we will use the decoder of a modified statistical machine translation system The translator types the prefix of the translation, and gets the best path as a suggestion

Based on Statistical Machine Translation(2009) by Philipp Koehn

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 8 / 37

slide-9
SLIDE 9

Statistical Machine Translation

Decoder

Source sentence

er geht ja nicht nach hause

Based on Statistical Machine Translation by Philipp Koehn

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 9 / 37

slide-10
SLIDE 10

Glass-box ITP

Example I

Typed prefix

he

Based on Statistical Machine Translation(2009) by Philipp Koehn

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 9 / 37

slide-11
SLIDE 11

Glass-box ITP

Example II

Typed prefix

he d

Based on Statistical Machine Translation(2009) by Philipp Koehn

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 9 / 37

slide-12
SLIDE 12

Glass-box ITP

Example III

Typed prefix

he does not go home

Based on Statistical Machine Translation(2009) by Philipp Koehn

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 9 / 37

slide-13
SLIDE 13

Black-box ITP

To generate translation suggestions, black-box ITP can use any kind

  • f bilingual resource that provides one or more translations for a

sentence This lets us to seamlessly integrate any kind of resource without needing to know how they work

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 10 / 37

slide-14
SLIDE 14

Black-box ITP

Generating suggestions

Source sentence This studio is spacious Subsegments of length 1 este estudio es espacioso Subsegments of length 2 este estudio estudio es es amplio Subsegments of length 3 este estudio est´ a estudio es amplio

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 11 / 37

slide-15
SLIDE 15

Black-box ITP

Offering suggestions

Typed prefix Este e Proposals estudio es estudio es amplio estudio este es amplio espacioso We need to rank and select which suggestions to show.

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-16
SLIDE 16

Black-box ITP

Example

Source sentence this studio is spacious Target sentence este estudio es amplio Prefix e Suggestions este estudio est´ a este estudio es amplio estudio

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-17
SLIDE 17

Black-box ITP

Example

Source sentence this studio is spacious Target sentence este estudio es amplio Prefix e Suggestions este estudio est´ a este estudio es amplio estudio

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-18
SLIDE 18

Black-box ITP

Example

Source sentence this studio is spacious Target sentence este estudio es amplio Prefix e Suggestions este estudio est´ a este estudio es amplio estudio Prefix este e Suggestions estudio es amplio estudio es amplio es

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-19
SLIDE 19

Black-box ITP

Example

Source sentence this studio is spacious Target sentence este estudio es amplio Prefix e Suggestions este estudio est´ a este estudio es amplio estudio Prefix este e Suggestions estudio es amplio estudio es amplio es

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-20
SLIDE 20

Black-box ITP

Example

Source sentence this studio is spacious Target sentence este estudio es amplio Prefix e Suggestions este estudio est´ a este estudio es amplio estudio Prefix este e Suggestions estudio es amplio estudio es amplio es Prefix este estudio es amplio

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 12 / 37

slide-21
SLIDE 21

Outline

1

Introduction

2

Automatic evaluation Software Method Metrics Results

3

Human evaluation

4

Summary

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 13 / 37

slide-22
SLIDE 22

Software

Glass-box ITP

As a glass-box implementation, we use Thot Suggests one translation completion that automatically updates as the user types the prefix Can also be used as an SMT system Trained using 1 000 000 sentences from the United Nations Parallel Corpus v1.02

◮ Motivated by lack of resources ◮ A bilingual domain adaptation technique2 has been used to minimize

the impact of reducing the size of the corpus

◮ Excerpts of this corpus will be used for testing 2http://conferences.unite.un.org/UNCorpus 2Axelrod, Amittai, Xiaodong He, and Jianfeng Gao. “Domain adaptation via pseudo

in-domain data selection.” (EMNLP 2011)

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 14 / 37

slide-23
SLIDE 23

Software

Black-box ITP

As a black-box implementation, we use Forecat Using Thot SMT as the only bilingual resource We use a multilayer perceptron2 for ranking the black-box model suggestions. Some features

◮ Source and target position and lengths of the suggestion ◮ Alignment model ◮ Position with respect the last used suggestion: before, after,

  • verlapping...

2With ≈ 104 parameters. Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 15 / 37

slide-24
SLIDE 24

Automatic evaluation

Method

We simulate the behaviour of a professional translator

◮ who has a planned, immutable translation in mind ◮ who writes monotonically ◮ who makes no mistakes ◮ who reads all the proposed suggestions, evaluates them all, and uses

the longest suggestion or suggestion prefix that fits (if any)

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 16 / 37

slide-25
SLIDE 25

Automatic evaluation

Metrics

We measure the keystroke ratio (KSR), the ratio between the number

  • f keys typed and the length of the final translation

◮ KSR< 1 means we saved some of the keystrokes by using suggestions ◮ If we type the translation without mistakes and use no suggestions, we

get KSR= 1

◮ KSR> 1 means we used extra keystrokes, e.g. the user mistyped or

rethought the translation halfway

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 17 / 37

slide-26
SLIDE 26

Automatic evaluation

Results: KSR 0.4 0.5 0.6 0.7 0.8 0.9 en→ar ar→en en→es es→en en→zh zh→en [Better] KSR [Worse] Black-box M=1 Black-box M=4 Glass-box

M = maximum number of suggestions. All differences between the values are statistically significant (p ≤ 0.05).

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 18 / 37

slide-27
SLIDE 27

Automatic evaluation

Results: Average number of words shown to the user

  • Avg. shown words

Black-box M = 1 1.4 (2.3)2 Black-box M = 4 5 (7.5) Glass-box 20

M = maximum number of suggestions.

2Figures in parenthesis exclude the steps where no suggestion is shown. Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 19 / 37

slide-28
SLIDE 28

Outline

1

Introduction

2

Automatic evaluation

3

Human evaluation Method and profile of the subjects Metrics and results

4

Summary

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 20 / 37

slide-29
SLIDE 29

Human evaluation

Profiles

The human evaluation was performed for translation from English to Spanish Profile of the 8 subjects

◮ Native Spanish speakers ◮ Computer science researchers ⋆ Limited working proficiency with English ⋆ Experienced typists (except one) ◮ No experience as translators

A more extensive evaluation with professional translators using a similar setup will be carried out soon.

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 21 / 37

slide-30
SLIDE 30

Human evaluation

Method

They had to translate 20 sentences arranged in 4 blocks SB1 to SB4 The glass-box approach (Thot) offers 1 whole-sentence suggestion The black-box approach (Forecat) offers 4 multi-word suggestions Users Induction Unassisted Black-box Glass-box U1, U5 SB1 SB2 SB3 SB4 U2, U6 SB4 SB1 SB2 SB3 U3, U7 SB3 SB4 SB1 SB2 U4, U8 SB2 SB3 SB4 SB1

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 22 / 37

slide-31
SLIDE 31

Human evaluation

Metrics I

∆KSR

◮ The offset between the assisted KSR and the unassisted KSR: lower is

better

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 23 / 37

slide-32
SLIDE 32

Human evaluation

Metrics II

Translation speed

◮ The ratio of the number of characters of the final translation to the

time elapsed for typing it

⋆ If creating a 100-character translation takes 100 seconds, including

time spent reading the source sentence and thinking the translation, we have a translation speed of 1Tc/s

◮ Measured in target characters per second, Tc/s

∆Tc/s

◮ The offset between the assisted Tc/s and the unassisted Tc/s: higher

is better

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 24 / 37

slide-33
SLIDE 33

Human evaluation

Comparison with unassisted translation

  • 0.6
  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1 0.2 0.3 0.4

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 [Worse] Δ KSR [Better] [Worse] Δ Translation speed (ΔT c/s) [Better] U1 U2 U3 U4 U5 U6 U7 U8 U1 U2 U3 U5 U6 U7 U4,8

  • = Glass box. = Black box.

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 25 / 37

slide-34
SLIDE 34

Human evaluation

Absolute values: most improvement (Users 1 and 2) 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 [Better] KSR [Worse] [Worse] T c/s [Better] 1 2 1 2 1 2

  • = Glass box. = Black box. ♦ = Unassisted

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 26 / 37

slide-35
SLIDE 35

Human evaluation

Absolute values: most deterioration (User 3) 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 [Better] KSR [Worse] [Worse] T c/s [Better] 3 3 3

  • = Glass box. = Black box. ♦ = Unassisted

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 27 / 37

slide-36
SLIDE 36

Human evaluation

Absolute values 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 [Better] KSR [Worse] [Worse] T c/s [Better] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

  • = Glass box. = Black box. ♦ = Unassisted

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 28 / 37

slide-37
SLIDE 37

Human evaluation

Metrics III

Emulated keystroke ratio (ESR)

◮ ESR is the automatically measured KSR using the translations

generated during the human test as references

◮ It represents an upper bound for improvement if the user did accept

the best suggestion each time some were offered

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 29 / 37

slide-38
SLIDE 38

Human evaluation

Potential improvement

0.25 0.5 0.75 1 1.25 1.5 1 2 3 4 5 6 7 8 [Better] KSR [Worse] Subject Black-box KSR Glass-box KSR ESR

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 30 / 37

slide-39
SLIDE 39

Human evaluation

Analysis of the gap between KSR and ESR

0.25 0.5 0.75 1 1.25 1.5 1 2 [Better] KSR [Worse] Black-box KSR

                         Related to ITP

◮ Bad interface ◮ User was not paying attention ◮ User distrusted the suggestions

Unrelated to ITP

◮ Typing mistakes ◮ Rethought translations Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 31 / 37

slide-40
SLIDE 40

Human evaluation

Questionnaire results

Users mostly agree that the suggestions helped them to translate (Median of 4 using 5-level Likert scale) The first glass-box suggestion (a whole translation of the sentence) was praised as very useful Test subjects complained about suggestions being offered too often

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 32 / 37

slide-41
SLIDE 41

Human evaluation

Questionnaire results II

U1 U2 U3 U4 U5 U6 U7 U8 1st B G G B* G G G B 2nd G B* B U U* B B G* 3rd U* U U* G B U* U* U B=black box. G=glass box. U=unassisted.

Systems ranked according to the perceived speed of translation. The task with the highest translation speed for each user is marked with *.

Only 3 subjects were faster with assistance:

◮ cognitive load may make users think they are translating faster when

they are actually translating slower

◮ slower translators get the most benefit Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 33 / 37

slide-42
SLIDE 42

Outline

1

Introduction

2

Automatic evaluation

3

Human evaluation

4

Summary

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 34 / 37

slide-43
SLIDE 43

Summary

20–50% keystrokes can potentially be saved using either the black-box or the glass-box approach

◮ Compared to faultlessly typed unassisted translation, KSR=1 ◮ Up to 60% for some sentences the test subjects translated

Most test subjects mostly agree in that both methods are useful... but are divided when choosing which system is better

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 35 / 37

slide-44
SLIDE 44

Outlook

Outlook

◮ Explore how only offering the best suggestions affects the performance ◮ More extensive evaluation with professional translators and different

languages

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 36 / 37

slide-45
SLIDE 45

Software developed for this paper

◮ Forecat:

github.com/transducens/forecat

◮ Forecat for OmegaT:

github.com/transducens/forecat-omegat

◮ Thot for OmegaT:

github.com/transducens/thot-omegat

Software used for this paper

◮ OmegaT:

www.omegat.org

◮ Thot:

daormar.github.io/thot

◮ Omegat SessionLog:

github.com/mespla/OmegaT-SessionLog

Slides: tinyurl.com/eamt2017dtr dtorregrosa@dlsi.ua.es

Daniel Torregrosa (Univ. Alacant) Comparing black and glass-box ITP EAMT2017 37 / 37