XTM International The Localization Web Open Data on the Web: W3C - - PowerPoint PPT Presentation

xtm international the localization web
SMART_READER_LITE
LIVE PREVIEW

XTM International The Localization Web Open Data on the Web: W3C - - PowerPoint PPT Presentation

Active Curation of Bi-Text Resources in Commercial Localization Workflows Dave Lewis TCD, Andrzej Zydro XTM International The Localization Web Open Data on the Web: W3C Semantic Web standards allow data to be published on Web


slide-1
SLIDE 1

Active Curation of Bi-Text Resources in Commercial Localization Workflows

Dave Lewis TCD, Andrzej Zydroń XTM International

slide-2
SLIDE 2
  • Open Data on the Web: W3C Semantic Web

standards allow data to be published on Web

– Fine-grained URI-based inter-linking – Extensible meta-data – Standard Query APIs

  • Enables a Localization Web

– Terms and translations become linkable resources – Meta-data from L10n workflows adds value – Leverage in training Machine Translation and Automatic Term Extraction

The Localization Web

The Localization Web = Decentralised Annotated Global Translation Memory and Term Base

slide-3
SLIDE 3

Web of Multilingual Content

slide-4
SLIDE 4

Domain Terminology

slide-5
SLIDE 5
  • Rich word

and phrase resources to assist translators

Babelfy: Public Lexical Resources

slide-6
SLIDE 6
  • Translation

suggestions can be fed into MT for more reliable translation

Links to BabelNet offer suggestions for Definitions and Translations

slide-7
SLIDE 7

Babelfy & Babelnet offer more term suggestions

slide-8
SLIDE 8
  • Public resources

may not always yield the right definitions or translations for the context

  • Need to track

human validation/ rejection to train automatic term extraction

slide-9
SLIDE 9

Active Curation of Linked Language Resources

The company has also reduced its production capacity by ceasing manufacture of chest freezers and freestanding microwave ovens

Extraction & Segmentation

production capacity capacité de production

✔ ✔

Annotation with Existing Terms

chest freezer microwave oven réfrigérateur four à micro-onde

? ? ? ?

Auto suggestion from Babelfy/Babelnet

D'autre part, la société a réduit sa capacité de production en arrêtant la production de réfrigérateur et de fours micro-onde pose-libre

Machine Translate with Term Translations

MT Vendor

?

D'autre part, la société a réduit sa capacité de production en arrêtant la production de congélateurs coffres et do fours micro-ondes pose-libre

congélateurs coffres fours micro-ondes

Postedit and capture terms in context

✔ ✔ ✔ ✔ ✔ ✔

PE PE PE PE PE PE PE

PE

slide-10
SLIDE 10
  • CSV of the Web: tables and JSON meta-

data

  • JSON-Linked Data
  • Provenance Vocabulary
  • Data Catalogue
  • Open Annotation
  • ITS2.0 Vocabulary
  • Also:

– Provenance Plan – Open Data Rights Language Linked Data Based on W3C Standards

slide-11
SLIDE 11

Language Resource s Language Workers

Language Technology

Language Lifecycle Dependencies

slide-12
SLIDE 12

Parallel Text & Term base

Posteditors

Machine Translation

Active Curation: Dynamic MT Retraining

  • Tighten curation cycle: from projects to

segments

– Prioritise postedits for retraining

  • Prioritise Term

Identification by posteditors

  • Assemble MT-ready,

lexically-rich term bases

slide-13
SLIDE 13
  • TermWeb/XTM/DCU
  • Introducing Next Gen Machine Translation
  • Massive scale bilingual dictionaries
  • BabelNet
  • Automatic Term Extraction: forced

decoding

  • Dynamic retraining
  • Optimal segment translation route
  • L3Data curation, sharing

Next Generation Machine Translation

slide-14
SLIDE 14

Data Management Lifecycles

Publish Correct & refine

Lex- concept lifecycle

Correct & refine Discover & use Discover & use Correct & refine

Bitext lifecycle

Discover data (Re)train- MT Revise and annotate Publish

Content lifecycle

Publish

I18n & source QA

Trans QA Post- edit

Automated translation Consume Create

slide-15
SLIDE 15
  • Better in-context postediting:

– XTM-Easyling

  • Feeding term suggestions from posteditor to Terminology Management

– XTM-Interverbum

  • Dynamic Retraining

– XTM-DCU

  • Bilingual Dictionary SMT improvements

– XTM-DCU

  • NER, terminology enforcements, forced decoding

– XTM-Interverbum-DCU

  • Postediting prioritisation and term flagging

– TCD-DCU-XTM

  • Publishing interlinks of parallel text, lexically rich term bases

– TCD: DG-T TM, EurVoc, Snomed-CT, LEMON, BabelNet

  • Closing the loop – operational instrumentation of postediting

– XTM

Systems Integration