Can out-of-the-box NMT Beat a Domain-trained Moses on Technical - - PowerPoint PPT Presentation

can out of the box nmt beat a domain trained moses on
SMART_READER_LITE
LIVE PREVIEW

Can out-of-the-box NMT Beat a Domain-trained Moses on Technical - - PowerPoint PPT Presentation

Can out-of-the-box NMT Beat a Domain-trained Moses on Technical Data? Anne Beyer Vivien Macketanz Philip Williams Aljoscha Burchard EAMT Prague, 29.05.20177 Background German LSP interested in MT technology Moses-based system


slide-1
SLIDE 1

Can out-of-the-box NMT Beat a Domain-trained Moses on Technical Data?

Anne Beyer Vivien Macketanz Philip Williams Aljoscha Burchard

EAMT Prague, 29.05.20177

slide-2
SLIDE 2

Background

  • German LSP interested in MT technology
  • Moses-based system in-house
  • Gathered some experience over the last years

working together with translators (some even positive../)

slide-3
SLIDE 3

Is NMT something for us?

  • NMT is gaining more and more atention
  • Open-source toolkits are available
  • Is it worth the efort yet?
slide-4
SLIDE 4

How?

  • No extensive user experience
  • Seting up a system requires additional resources
  • How to evaluate?
slide-5
SLIDE 5

How?

  • No extensive user experience

→ We have to start somewhere

  • Seting up a system requires additional resources
  • How to evaluate?
slide-6
SLIDE 6

How?

  • No extensive user experience

→ We have to start somewhere

  • Seting up a system requires additional resources

→ Let‘s start with an existing model (Uni Edinburgh)

  • How to evaluate?
slide-7
SLIDE 7

How?

  • No extensive user experience

→ We have to start somewhere

  • Seting up a system requires additional resources

→ Let‘s start with an existing model (Uni Edinburgh)

  • How to evaluate?

→ Phenomenon-driven test-suite (DFKI)

slide-8
SLIDE 8

Prelim limin inary study on how out-of-the-box NMT can perform compared to a customised SMT system

What do we want to achieve?

slide-9
SLIDE 9

Te Method

  • Phenomenon-driven approach

– Re-introducing Linguistics into MT evaluation

  • Analyse customer data with respect to linguistic

phenomena

  • Select the most frequent ones
  • Compare NMT/SMT system’s performance on those

categories

slide-10
SLIDE 10

Te Data

  • Selection of “real world” customer data collected over

a three month period

  • Catalogue of technical tools
  • German → English
  • ~ 5,000 Segments
slide-11
SLIDE 11

Te SMT System

  • Based on Moses toolkit
  • Trained on customer data (TM + terminology)
  • 337,600 segments
  • Specifc tag-handling using m4loc
slide-12
SLIDE 12

Te NMT System

  • Provided by the University of Edinburg
  • Based on Nematus toolkit
  • Trained on WMT data
  • Best performing system on WMT‘16 news

translation task

slide-13
SLIDE 13

Manual Evaluation Results

slide-14
SLIDE 14

Example 1

slide-15
SLIDE 15

Example 2

slide-16
SLIDE 16

Example 3

Source: Neben den Bedingungen zur Aufstellung und Inbetriebnahme wird eine Vielzahl von technischen und gesetzlichen Anforderungen an das Lager selbst gestellt, um z. B. wassergefährdende Flüssigkeiten, Säuren und Laugen oder auch entzündbare Flüssigkeiten gesetzeskonform aufzubewahren und zu lagern. NMT: In addition to the conditions for installation and commissioning, a wide range of technical and legal requirements will be placed on the warehouse itself in

  • rder to maintain and store, for example, water-

hazardous liquids, acids and foliage, or even fammable liquids.

slide-17
SLIDE 17

Conclusion

  • In this analysis NMT outperformed SMT, even

though SMT was at an advantage

– BUT: Tags and terminology are among the most

important categories in commercial translation

  • NMT development is only geting started

→ We should start looking into this, especially together with translators!

slide-18
SLIDE 18

What’s next?

  • Confrm linguistic fndings with translators

– Study usefulness of NMT pre-translation in their working

environment

  • MT selection task with post-editing
  • Productivity tests
  • Look into OpenNMT

– Find means for tag-handling with NMT – Use/Add customer data to training corpus

slide-19
SLIDE 19

Tank you!

anne.beyer@beo-doc.de

slide-20
SLIDE 20

Automatic Evaluation Results

slide-21
SLIDE 21

Example 4