Quality Estimation Christian Buck, University of Edinburgh In this - - PowerPoint PPT Presentation

quality estimation
SMART_READER_LITE
LIVE PREVIEW

Quality Estimation Christian Buck, University of Edinburgh In this - - PowerPoint PPT Presentation

Quality Estimation Christian Buck, University of Edinburgh In this lecture you will ... Lose trust in MT Learn how to trust some MT Learn how to build a complete confidence estimation system Be surprised how easy that is Be


slide-1
SLIDE 1

Quality Estimation

Christian Buck, University of Edinburgh

slide-2
SLIDE 2

In this lecture you will ...

  • Lose trust in MT
  • Learn how to trust some MT
  • Learn how to build a complete confidence

estimation system

  • Be surprised how easy that is
  • Be also surprised how hard it is
slide-3
SLIDE 3

MT - what is it good for?

  • Making Websites available
  • Skyping with foreign landlords
  • Post-Editing
  • Trading (including HFT)
  • Information Retrieval

Easy to fail at any of these

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

(Sentence Level) Quality Estimation

Produce quality score

○ Given source and (machine) translation ○ Without reference translation

Applications:

○ Good enough for publishing (print signs)? ○ Inform readers ○ Hide terrible translation from post-editors ○ Decide between different systems

slide-10
SLIDE 10

Q = f(source, target)

slide-11
SLIDE 11

Q = f(source, target, MT)

slide-12
SLIDE 12

2003 Summer Workshop @ JHU

slide-13
SLIDE 13

What is good quality?

Early work: Predict automatic scores

  • BLEU (~TrustRank)
  • WER
  • [many other scores not yet invented]

Problem: noisy on sentence level

slide-14
SLIDE 14

Good quality for gisting

Content should be comprehensible Accuracy over Fluency? Gold standard:

  • Collect feedback from users

○ Likert scores 1-4, 1-5, ...

  • Answer questions
slide-15
SLIDE 15

Good quality for post-editing

Time is money Avoid making translators hate their job Fit with workflow Only show MT if speedup expected Measure time, collect interface actions Humans are complicated

slide-16
SLIDE 16

Summary

  • 1. Specify objective
  • 2. Get training data
  • 3. Extract features
  • 4. Train classifier / regression model
  • 5. Profit!
slide-17
SLIDE 17

Necessary tool for human trials

slide-18
SLIDE 18

Features

Think of some features!

slide-19
SLIDE 19

Common good features

  • Source sentence perplexity
  • Number of out-of-vocabulary words
  • Number of words with many translations
  • Number of words in source
  • Mismatched question marks
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

Simple source side features

  • Language model score
  • Number of

○ Words ○ Characters

  • Percentage of

○ Proper names ○ Numbers ○ Punctuation characters ○ Very rare/common words/ngrams

slide-25
SLIDE 25

Simple source side features

  • Language model score
  • Number of

○ Words ○ Characters

  • Percentage of

○ Proper names ○ Numbers ○ Punctuation characters ○ Very rare/common words/ngrams Things that make MT difficult

slide-26
SLIDE 26

HTER

Source Sentence Length

credits: Shah et al, 2014

slide-27
SLIDE 27

HTER

Source LM Score

credits: Shah et al, 2014

slide-28
SLIDE 28

Hard to translate?

"Zora told it like it was," said Ella Dinkins, 90,

  • ne of the Johnson girls Hurston immortalized

by quoting men singing off-color songs about their beauty.

slide-29
SLIDE 29

Hard to translate?

"Zora told it like it was," said Ella Dinkins, 90,

  • ne of the Johnson girls Hurston

immortalized by quoting men singing off-color songs about their beauty.

slide-30
SLIDE 30

Hard to translate?

"Zora told it like it was," said Ella Dinkins, 90,

  • ne of the Johnson girls Hurston

immortalized by quoting men singing off-color songs about their beauty.

slide-31
SLIDE 31

Hard to translate?

"Zora told it like it was," said Ella Dinkins, 90,

  • ne of the Johnson girls Hurston

immortalized by quoting men singing off-color songs about their beauty.

slide-32
SLIDE 32
slide-33
SLIDE 33

More source side features

Words with many possible translations

English German P(German|English)

work Arbeit (job, physics, object) 0.4 arbeiten (to work) 0.2 Aufgabe (task) 0.2 Werk (work of art) 0.1 Arbeitsplatz (workplace) 0.1

slide-34
SLIDE 34

Rare and common n-grams

Zora told it like it was,

Zora told it told it like it like it like it was it was ,

slide-35
SLIDE 35

Rare and common n-grams

[Zora told it] [told it like] [it like it] [like it was] [it was ,]

n-grams from large corpus, sorted by count frequent infrequent

slide-36
SLIDE 36

Rare and common n-grams

[Zora told it] [told it like] [it like it] [like it was] [it was ,]

frequent infrequent

slide-37
SLIDE 37

Rare and common n-grams

[Zora told it] [told it like] [it like it] [like it was] [it was ,]

frequent infrequent

slide-38
SLIDE 38

Linguistic features: POS

  • Part of speech (POS) LM

○ on source or target side

  • LEPOR (~BLEU on POS Tags)
slide-39
SLIDE 39

LEPOR

its ratification would require 226 votes seine Ratifizierung erfordern wuerde 226

Example from: Han et. al (2014)

slide-40
SLIDE 40

LEPOR

its ratification would require 226 votes PRON NOUN VERB VERB NUM NOUN seine Ratifizierung erfordern wuerde 226 PRON NOUN NOUN VERB NUM

slide-41
SLIDE 41

LEPOR

its ratification would require 226 votes PRON NOUN VERB VERB NUM NOUN seine Ratifizierung erfordern wuerde 226 PRON NOUN NOUN VERB NUM

slide-42
SLIDE 42

Linguistic features II

Picture: Wikipedia

slide-43
SLIDE 43

Linguistic features II

slide-44
SLIDE 44

Pseudo-References

The “How much does it look like the Google translation?”-feature Applicability questionable

slide-45
SLIDE 45

Back-Translation

Idea:

  • 1. Translate target back to source language
  • 2. Compare with original (using BLEU, TER)
slide-46
SLIDE 46

Back-Translation

slide-47
SLIDE 47

Back-Translation

slide-48
SLIDE 48

Back-Translation

slide-49
SLIDE 49

Back-Translation

slide-50
SLIDE 50

Back-Translation

Original: In Deutschland wird scheinbar kontrovers über Europas Rettungspolitik diskutiert.

slide-51
SLIDE 51

Cross-Translation

slide-52
SLIDE 52

Word level errors

Roughly: Germany is seemingly controversially discussing Europe’s bailout policy

slide-53
SLIDE 53

Word level error annotation

slide-54
SLIDE 54
slide-55
SLIDE 55

Word Posterior Probabilities (WPP)

p Mary slapped the green witch. 0.7 Mary did slap the green witch. 0.2 It was Mary who slapped the green witch. 0.1

slide-56
SLIDE 56

Feature Selection

Find best subset of 24 features

  • How many subsets?
slide-57
SLIDE 57

Feature Selection

Find best subset of 24 features

  • 2^24 subsets
  • Testing 1 subset takes 1m. How long?
slide-58
SLIDE 58

Feature Selection

Find best subset of 24 features

  • 2^24 subsets
  • Testing 1 subset takes 1m.
  • Wait 32 years

Feasible!

slide-59
SLIDE 59

Greedy feature selection

Forward selection

  • Add feature that gives best improvement on

dev set Backward selection

  • Remove feature that gives best improvement
  • n dev set (when it’s gone)
slide-60
SLIDE 60

Alternatives

Gaussian Processes Sparsity inducing regularization (L1) Hand picking Random search

slide-61
SLIDE 61

Get your hands dirty

http://statmt.org/wmt15/quality-estimation-task.html

  • Sentence level (predict HTER)
  • Word level (predict Good/Bad)
  • Paragraph level (predict METEOR)

Submission: May 25, 2015