Towards European Transla/on Cloud: Development of public MT services - - PowerPoint PPT Presentation

towards european transla on cloud development of public
SMART_READER_LITE
LIVE PREVIEW

Towards European Transla/on Cloud: Development of public MT services - - PowerPoint PPT Presentation

Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries META-FORUM 2016 | Lisabon | July 5, 2016 Andrejs Vasijevs Rihards Kalni /lde.com A T A Translation anslation Cl Cloud oud Sc Scenario


slide-1
SLIDE 1

Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries

Andrejs Vasiļjevs Rihards Kalniņš /lde.com

META-FORUM 2016 | Lisabon | July 5, 2016

slide-2
SLIDE 2

A T A Translation anslation Cl Cloud

  • ud Sc

Scenario enario

Priority Theme 1: T Priority Theme 1: Translation in the Cl anslation in the Cloud

  • ud

MET META-NET Str A-NET Strat ategic R egic Resear esearch Agenda ch Agenda

slide-3
SLIDE 3

Technology Council at META-FORUM 2012

Vision: Applic Vision: Applications needed by EU ations needed by EU citizens and busines citizens and businesses ses

q Europe-wide social and business networking in native language q Mobile and internet services in native language for e-Commerce,

education, travel, entertainment, etc.

q eGovernment reaching all linguistic groups and enabling political

discussion across borders

q Unlimited TV/movie cross-language subtitling/interpretation q Ever present Personal Interpreter q Translingual Spaces: dedicated locations for ambient interpretation

slide-4
SLIDE 4

Downloadable at http://www.meta-net.eu/whitepapers/key-results-and-cross-language-comparison

Machine Machine Translation anslation support support for

  • r

Eur European

  • pean languages

languages

slide-5
SLIDE 5

European Transla-on Cloud

CEF Automated Transla-on Language Resources

Estonia MT Services & LR Latvia MT Services & LR Lithuania MT Services & LR Na-onal MT Services & LR

slide-6
SLIDE 6

HUGO.LV

LATVIA TRANSLATES WITH

  • MT service for the Latvian public sector
  • Se

Secur urely ly translates texts, documents, websites

  • Ad

Adapted ed for the Latvian language and public sector texts

  • In

Integr egrated ed into e-services and government websites

  • Languages
  • Latvian-English
  • English-Latvian
  • Latvian-Russian

Developed by

  • lde.com/mt
slide-7
SLIDE 7

Translation website

slide-8
SLIDE 8

MT integration through Translation API

slide-9
SLIDE 9
slide-10
SLIDE 10

Adjustable for terminology, named entities, translation memories

slide-11
SLIDE 11

Integrated into internal informa/on system of the Parliament

slide-12
SLIDE 12

Types and sources for data collec1on

  • Public Corpora
  • Crawling Web
  • Texts from Publishers
  • Data from public administra/on
slide-13
SLIDE 13

Size of the created corpora

Corpus language (pair) Corpus Type Domain Corpus size (millions of sentences)

English-Latvian Parallel General 5.8 Latvian-Russian Parallel General 5.1 English-Latvian Parallel State Adm. 3.3 Latvian-Russian Parallel State Adm. 2.0 English Monol. General 50 Latvian Monol. General 75 Russian Monol. General 75 English Monol. State Adm. 15 Latvian Monol. State Adm. 25 Russian Monol. State Adm. 24

slide-14
SLIDE 14

Evalua/on of MT systems, domain adapta/on

System BLEU Language pair Domain HUGO.lv Google Translate English-Latvian General 34.85 31.05 English-Latvian State administration 55.58 26.05 Latvian-English General 44.11 42.92 Latvian-English State administration 60.93 28.00 Latvian-Russian General 40.66 14.41 Latvian-Russian State administration 65.88 19.72

slide-15
SLIDE 15

Machine Transla-on for the Presidency of European Council

slide-16
SLIDE 16
  • 1 680 Presidency

events

  • 25 300 par/cipants
  • 800+ journalists from

40 countries

  • 197 EU policy mee/ngs

2015 Presidency of the Council of the EU

  • lde.com/mt
slide-17
SLIDE 17
slide-18
SLIDE 18
  • An

Analyzed ed specific EU Presidency terminology, added these terms to the system

  • Ad

Adapted ed MT to use the right terms in the right inflec/onal forms – a challenge for SMT and morphologically rich languages

Adap-ng Hugo.lv for EU Presidency

  • lde.com/mt
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

Mobile MT kiosk for the EU Presidency headquarters at the Latvian Na/onal Library – transla/on as a u/lity

  • lde.com/mt
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

“There will be no digital single market without mul/lingualism ul/lingualism, as few people feel comfortable

  • pera/ng in their second or third
  • language. ”

Robert Madelin Chief Innova/on Officer, European Commission

slide-29
SLIDE 29

The plakorm is nominated for the World Summit Award and World Summit on the Informa/on Society Prize

slide-30
SLIDE 30

Lithuanian Language in the Informa/on Society Programme Lietuvių kalba informacinėje visuomenėje

slide-31
SLIDE 31

Lithuanian Language in the Informa-on Society Programme Lietuvių kalba informacinėje visuomenėje

Th The p e purpos

  • se of

e of t the p e prog

  • gram

To develop solu/ons that would help to preserve Lithuanian language in all public state life areas and to provide free, easy and convenient access to these solu/ons by applying modern informa/on and communica/on technologies. Suppo Support rted d ac ac/vi/e /vi/es Lithuanian monolingual and speech corpora collec/on, educa/onal training tools development, open sonware localiza/on, monolingual and mul/lingual dic/onaries digitaliza/on, speech technologies applica/ons development, seman/c analysis tools and machine transla/on systems development. Number of projects – 6

Consolidated language resources and applica/ons web page – www.raš/ja.lt

slide-32
SLIDE 32

VERSTI.EU

Public Lithuanian Machine Transla-on

Development of sta-s-cal MT for Lithuanian-English-Lithuanian and Lithuanian-French-Lithuanian language pairs Project owner: Vilnius university Project dura-on: 01.04.2012 – 15.05.2015 Project site: www.vers-.eu

slide-33
SLIDE 33

Features:

  • Web sites transla/on
  • Document transla/on
  • Terminology extrac/on and integra/on in to MT
  • Browser transla/on add-on
  • Mobile applica/ons
  • Freely available web service API

VERSTI.EU

Public Lithuanian Machine Transla-on

slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
  • iOS, Android, Windows Phone
  • Technology: PhoneGap
  • Access cloud-based systems

through API

Mobile Apps

slide-37
SLIDE 37

Corpora Total General domain Monolingual Lithuanian

856,7 M words

English

1 928,9 M words

French

1 163,3 M words

Parallel English-Lithuanian

8,1 M sentences

French-Lithuanian

7,0 M sentences

Legal domain Monolingual Lithuanian

207,8 M words

English

582,3 M words

French

586,3 M words

Parallel English-Lithuanian

6.3 M sentences

French-Lithuanian

5,7 M sentences

IT domain Monolingual Lithuanian

254,1 M words

English

262,1 M words

Parallel English-Lithuanian

5,0 M sentences

Data Collected

VERSTI.EU

Public Lithuanian Machine Transla-on

slide-38
SLIDE 38

French-Lithuanian-French MT Systems

37,57 17,45 14,33 37,19 17,95 18,03 VERSTI.EU GOOGLE TRANSLATE MICROSOFT TRANSLATOR

Comparison of Lithuanian MT systems (BLEU)

French-Lithuanian Lithuanian-French

slide-39
SLIDE 39
slide-40
SLIDE 40
  • Aims to achieve a level of language technology support for the Estonian

language to enable the language to successfully operate and thrive in today's informa/on technology-based world.

  • The programme funds language technology research and development,

from the compila/on of resources to the crea/on of applica/on prototypes.

  • Sub-objec/ves and expected results of the programme include machine

transla/on.

  • Funding in 2016: 765 342 € for 17 projects
  • Tilde's MT project and parallel corpora collec/on project.

hrps://www.keeletehnoloogia.ee/en?set_language=en

slide-41
SLIDE 41

11 2 1,5 2,5 2

2 4 6 8 10 12 14 16 English-Estonian Russian-Estonian Latvian-Estonian French-Estonian

Million Millions

Estonian Open Parallel Corpus Composed by Tilde Ees1

2012-2014 2015

slide-42
SLIDE 42

21 17 19 24

5 10 15 20 25 30

Google Translate Microson Bing University of Tartu Tilde LetsMT MT@EC

slide-43
SLIDE 43
slide-44
SLIDE 44

Very littl ttle 27% 27% Somewhat t 15% 15% Moderate tely 28% 28% Greatl tly 21% 21% Very Very much much 9% 9%

HOW MUCH TIME E IS SAVED ED IN PUBLIC INSTITUTIONS WITH THE E HEL ELP OF MT? >400 public servants surveyed by Tilde MA student Lauri Valge supervised by Mar/n Luts.

slide-45
SLIDE 45
  • Although European Transla/on Cloud is s/ll a dream,

significant steps are being made by both European Commission – CEF Automated Transla/on, and Member States – Bal/c examples

  • Original vision of the Transla/on Cloud can be

implemented only through close coopera/on between EC, Member States and private sector

  • Coordinated approach is needed through a joint

European Programme (part of the Mul/lingual Value Programme?)

slide-46
SLIDE 46

Thank you!

andrejs@/lde.com /lde.com