The Efficacy of Human Post-Editing for Language Translation Spence - - PowerPoint PPT Presentation

the efficacy of human post editing for language
SMART_READER_LITE
LIVE PREVIEW

The Efficacy of Human Post-Editing for Language Translation Spence - - PowerPoint PPT Presentation

The Efficacy of Human Post-Editing for Language Translation Spence Green Jeffrey Heer Christopher D. Manning Stanford University CHI 2013 // 29 April 2013 Ngarrka-ngku ka wawirri panti-rni Ngarrka-ngku ka wawirri panti-rni man


slide-1
SLIDE 1

The Efficacy of Human Post-Editing for Language Translation

Spence Green Jeffrey Heer Christopher D. Manning Stanford University

CHI 2013 // 29 April 2013

slide-2
SLIDE 2

Ngarrka-ngku ka wawirri panti-rni

slide-3
SLIDE 3

Ngarrka-ngku ka wawirri panti-rni man kangaroo spear

slide-4
SLIDE 4

Ngarrka-ngku ka wawirri panti-rni man kangaroo spear The man is spearing the kangaroo Ngarrka-ngku ka wawirri panti-rni man kangaroo spear

slide-5
SLIDE 5

Scaling up language translation

NLP—fully automatic translation (MT) Not yet human quality HCI—collaborative and crowdsourced translation Cost-effective but slow 3

slide-6
SLIDE 6

Scaling up language translation

NLP—fully automatic translation (MT) Not yet human quality HCI—collaborative and crowdsourced translation Cost-effective but slow Our work: NLP+HCI = interactive translation 3

slide-7
SLIDE 7

NLP+HCI: Interactive translation

[Bisbey and Kay 1972] 4

slide-8
SLIDE 8

Interactive MT: Caitra

[Koehn 2009] 5

slide-9
SLIDE 9

Interactive MT: YouTube captions

6

slide-10
SLIDE 10

Does interactive MT enhance productivity?

Mixed prior results Faster or slower? Higher or lower translation quality? 7

slide-11
SLIDE 11

Does interactive MT enhance productivity?

Mixed prior results Faster or slower? Higher or lower translation quality? Expert translator skepticism of MT Low quality? You want to pay me less!? 7

slide-12
SLIDE 12

“Advantages” of post-editing machine translation

slide-13
SLIDE 13

Our view: MT improving rapidly

slide-14
SLIDE 14

This work: Post-editing user study

Simplest interactive MT: Post-editing 10

slide-15
SLIDE 15

This work: Post-editing user study

Simplest interactive MT: Post-editing Hypotheses:

  • 1. Post-edit reduces translation time

10

slide-16
SLIDE 16

This work: Post-editing user study

Simplest interactive MT: Post-editing Hypotheses:

  • 1. Post-edit reduces translation time
  • 2. Post-edit increases quality

10

slide-17
SLIDE 17

This work: Post-editing user study

Simplest interactive MT: Post-editing Hypotheses:

  • 1. Post-edit reduces translation time
  • 2. Post-edit increases quality
  • 3. Suggestions prime the translator

10

slide-18
SLIDE 18

This work: Post-editing user study

Simplest interactive MT: Post-editing Hypotheses:

  • 1. Post-edit reduces translation time
  • 2. Post-edit increases quality
  • 3. Suggestions prime the translator
  • 4. Post-edit reduces drafting

10

slide-19
SLIDE 19

This work: Post-editing user study

Simplest interactive MT: Post-editing Hypotheses:

  • 1. Post-edit reduces translation time
  • 2. Post-edit increases quality
  • 3. Suggestions prime the translator
  • 4. Post-edit reduces drafting

Exploratory and confirmatory analysis 10

slide-20
SLIDE 20

Post-editing experimental design

Task translate an English sentence to ... 11

slide-21
SLIDE 21

Post-editing experimental design

Task translate an English sentence to ... Target languages Arabic, French, German 11

slide-22
SLIDE 22

Post-editing experimental design

Task translate an English sentence to ... Target languages Arabic, French, German Conditions Unaided and post-edit 11

slide-23
SLIDE 23

Post-editing experimental design

Task translate an English sentence to ... Target languages Arabic, French, German Conditions Unaided and post-edit Expert Subjects 16 per target language 11

slide-24
SLIDE 24

Experimental design

Two-way, mixed design Translation conditions (within subjects) Source sentences (between subjects) 12

slide-25
SLIDE 25

Experimental design

Two-way, mixed design Translation conditions (within subjects) Source sentences (between subjects) Two timed translation efforts Untimed break Total time: about 60 min. per subject 12

slide-26
SLIDE 26

Experimental design

Two-way, mixed design Translation conditions (within subjects) Source sentences (between subjects) Two timed translation efforts Untimed break Total time: about 60 min. per subject MT from Google [March 2012] 12

slide-27
SLIDE 27

Unaided UI

13

slide-28
SLIDE 28

Post-edit UI

14

slide-29
SLIDE 29

Experimental setup: Linguistic data

Topic selections from Wikipedia

  • 1. Flag of Japan

easy

  • 2. 1896 Olympic Games

easy

  • 3. Schizophrenia

hard

  • 4. Infinite Monkey Theorem

hard One easy, one hard per condition 15

slide-30
SLIDE 30

It was the first international Olympic Games held in the Modern era.

slide-31
SLIDE 31

The chance of their doing so is decidedly more favourable than the chance of the molecules returning to

  • ne half of the vessel.
slide-32
SLIDE 32

Experimental setup: Human subjects

Expert freelance translators on oDesk Ecological validity Fair payment: subjects bid on job 18

slide-33
SLIDE 33

Experimental setup: Human subjects

Expert freelance translators on oDesk Ecological validity Fair payment: subjects bid on job Lots of subject data

  • Desk language skills tests

Hours worked per week Demographic information 18

slide-34
SLIDE 34

Experimental setup: Quality rating

Same setup as annual Workshop on Machine Translation 19

slide-35
SLIDE 35

Experimental setup: Quality rating

Same setup as annual Workshop on Machine Translation Crowdsourced, pairwise evaluation on MTurk 19

slide-36
SLIDE 36

Experimental setup: Quality rating

Same setup as annual Workshop on Machine Translation Crowdsourced, pairwise evaluation on MTurk Three judgments per translation pair 19

slide-37
SLIDE 37
slide-38
SLIDE 38

Results

slide-39
SLIDE 39

Fixed effects fallacies

Fixed effect—Data includes all factor levels Gender Machine configuration 22

slide-40
SLIDE 40

Fixed effects fallacies

Fixed effect—Data includes all factor levels Gender Machine configuration Random effect—sampled levels Human subjects (RM-ANOVA) 22

slide-41
SLIDE 41

Fixed effects fallacies

Fixed effect—Data includes all factor levels Gender Machine configuration Random effect—sampled levels Human subjects (RM-ANOVA) English source sentences Target languages “Language as fixed-effect fallacy” [Clark 1973] 22

slide-42
SLIDE 42

Mixed effects models

y = x⊺β

  • Linear predictor

+

Random effects structure

  • z⊺b

+ η

  • Error term

23

slide-43
SLIDE 43

Post-editor variance

  • 24
slide-44
SLIDE 44

Recap: Experimental hypotheses

  • 1. Post-edit reduces translation time
  • 2. Post-edit increases quality
  • 3. Suggestions prime the translator
  • 4. Post-edit reduces drafting

25

slide-45
SLIDE 45

Hypothesis #1: Reduced time

  • 26
slide-46
SLIDE 46

Hypothesis #1: Reduced time

Post-edit reduces translation time? 27

slide-47
SLIDE 47

Hypothesis #1: Reduced time

Post-edit reduces translation time? Yes! p < 0.001 Significant covariates Source length % nouns in sentence 27

slide-48
SLIDE 48

Source hover patterns predict time?

Starting in 1870, flags were created for the Japanese Emperor (then Emperor Meiji), the Empress, and for other members of the imperial family. At first, the emperor's flag was ornate, with a sun resting in the center of an artistic pattern. He had flags that were used on land, at sea, and when he was in a carriage. The imperial family was also granted flags to be used at sea and while on land (one for use on foot and one carriage flag). The carriage flags were a monocolored chrysanthemum, with 16 petals, placed in the center

  • f a monocolored background.

These flags were discarded in 1889 when the Emperor decided to use the chrysanthemum on a red background as his flag. With minor changes in the color shades and proportions, the flags adopted in 1889 are still in use by the imperial family. A person diagnosed with schizophrenia may experience hallucinations (most reported are hearing voices), delusions (often bizarre or persecutory in nature), and disorganized thinking and speech. The latter may range from loss of train of thought, to sentences only loosely connected in meaning, to incoherence known as word salad in severe cases. Social withdrawal, sloppiness of dress and hygiene, and loss of motivation and judgment are all common in schizophrenia. There is often an observable pattern of emotional difficulty, for example lack of responsiveness. Impairment in social cognition is associated with schizophrenia,as are symptoms of paranoia; social isolation commonly occurs. In one uncommon subtype, the person may be largely mute, remain motionless in bizarre postures,

  • r exhibit purposeless agitation, all signs of

catatonia. Late adolescence and early adulthood are peak periods for the onset of schizophrenia, critical years in a young adult's social and vocational development. In 40% of men and 23% of women diagnosed with schizophrenia, the condition manifested itself before the age of 19. The physicist Arthur Eddington drew on Borel's image further in The Nature of the Physical World (1928), writing: If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel. These images invite the reader to consider the incredible improbability of a large but finite number

  • f monkeys working for a large but finite amount of

time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys' success is effectively impossible, and it may safely be said that such a process will never happen. The 1896 Summer Olympics, officially known as the Games of the I Olympiad, was a multi­sport event celebrated in Athens, Greece, from 6 to 15 April 1896. It was the first international Olympic Games held in the Modern era. Because Ancient Greece was the birthplace of the Olympic Games, Athens was perceived to be an appropriate choice to stage the inaugural modern Games.

28

slide-49
SLIDE 49

Source hover patterns predict time?

Starting in 1870, flags were created for the Japanese Emperor (then Emperor Meiji), the Empress, and for other members of the imperial family. At first, the emperor's flag was ornate, with a sun resting in the center of an artistic pattern. He had flags that were used on land, at sea, and when he was in a carriage. The imperial family was also granted flags to be used at sea and while on land (one for use on foot and one carriage flag). The carriage flags were a monocolored chrysanthemum, with 16 petals, placed in the center

  • f a monocolored background.

These flags were discarded in 1889 when the Emperor decided to use the chrysanthemum on a red background as his flag. With minor changes in the color shades and proportions, the flags adopted in 1889 are still in use by the imperial family. A person diagnosed with schizophrenia may experience hallucinations (most reported are hearing voices), delusions (often bizarre or persecutory in nature), and disorganized thinking and speech. The latter may range from loss of train of thought, to sentences only loosely connected in meaning, to incoherence known as word salad in severe cases. Social withdrawal, sloppiness of dress and hygiene, and loss of motivation and judgment are all common in schizophrenia. There is often an observable pattern of emotional difficulty, for example lack of responsiveness. Impairment in social cognition is associated with schizophrenia,as are symptoms of paranoia; social isolation commonly occurs. In one uncommon subtype, the person may be largely mute, remain motionless in bizarre postures,

  • r exhibit purposeless agitation, all signs of

catatonia. Late adolescence and early adulthood are peak periods for the onset of schizophrenia, critical years in a young adult's social and vocational development. In 40% of men and 23% of women diagnosed with schizophrenia, the condition manifested itself before the age of 19. The physicist Arthur Eddington drew on Borel's image further in The Nature of the Physical World (1928), writing: If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel. These images invite the reader to consider the incredible improbability of a large but finite number

  • f monkeys working for a large but finite amount of

time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys' success is effectively impossible, and it may safely be said that such a process will never happen. The 1896 Summer Olympics, officially known as the Games of the I Olympiad, was a multi­sport event celebrated in Athens, Greece, from 6 to 15 April 1896. It was the first international Olympic Games held in the Modern era. Because Ancient Greece was the birthplace of the Olympic Games, Athens was perceived to be an appropriate choice to stage the inaugural modern Games.

“Noun %” significant in time models 28

slide-50
SLIDE 50

Hypothesis #2: Higher quality

  • 29
slide-51
SLIDE 51

Hypothesis #2: Higher quality

Post-edit increases quality? 30

slide-52
SLIDE 52

Hypothesis #2: Higher quality

Post-edit increases quality? Yes! p < 0.001 Significant covariates Source language proficiency test 30

slide-53
SLIDE 53

Hypothesis #3: Priming

Suggestions prime the translator? 31

slide-54
SLIDE 54

Hypothesis #3: Priming

Suggestions prime the translator? Yes! p < 0.001 for each language Test setup

◮ Edit distance to MT ◮ Paired t-test

31

slide-55
SLIDE 55

Hypothesis #4: Less drafting

  • Unaided condition
  • Post-edit condition

32

slide-56
SLIDE 56

Hypothesis #4: Less Drafting

Post-edit results in less drafting? 33

slide-57
SLIDE 57

Hypothesis #4: Less Drafting

Post-edit results in less drafting? Yes! p < 0.01 Post-edit condition behavior Fewer, longer pauses Pauses are larger % of total time 33

slide-58
SLIDE 58

Conclusions

Simple source lexical features predict time 34

slide-59
SLIDE 59

Conclusions

Simple source lexical features predict time Post-edit → different interaction patterns 34

slide-60
SLIDE 60

Conclusions

Simple source lexical features predict time Post-edit → different interaction patterns Suggestions prime the translator 34

slide-61
SLIDE 61

Conclusions

Simple source lexical features predict time Post-edit → different interaction patterns Suggestions prime the translator Post-edit improves speed and quality 34

slide-62
SLIDE 62

The Efficacy of Human Post-Editing for Language Translation

Spence Green, Jeffrey Heer, and Christopher D. Manning Data and code are available vis.stanford.edu nlp.stanford.edu spencegreen.com