towards european transla on cloud development of public
play

Towards European Transla/on Cloud: Development of public MT services - PowerPoint PPT Presentation

Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries META-FORUM 2016 | Lisabon | July 5, 2016 Andrejs Vasijevs Rihards Kalni /lde.com A T A Translation anslation Cl Cloud oud Sc Scenario


  1. Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries META-FORUM 2016 | Lisabon | July 5, 2016 Andrejs Vasiļjevs Rihards Kalniņš /lde.com

  2. A T A Translation anslation Cl Cloud oud Sc Scenario enario Priority Theme 1: T Priority Theme 1: Translation in the Cl anslation in the Cloud oud MET META-NET Str A-NET Strat ategic R egic Resear esearch Agenda ch Agenda

  3. Vision: Applic Vision: Applications needed by EU ations needed by EU citizens and busines citizens and businesses ses q Europe-wide social and business networking in native language q Mobile and internet services in native language for e-Commerce, education, travel, entertainment, etc. q eGovernment reaching all linguistic groups and enabling political discussion across borders q Unlimited TV/movie cross-language subtitling/interpretation q Ever present Personal Interpreter q Translingual Spaces: dedicated locations for ambient interpretation Technology Council at META-FORUM 2012

  4. Machine Machine Translation anslation support support for or Eur European opean languages languages Downloadable at http://www.meta-net.eu/whitepapers/key-results-and-cross-language-comparison

  5. Lithuania MT Services & LR Latvia MT Services & LR Na-onal MT Services & LR CEF Automated Estonia Transla-on MT Services & LR Language Resources European Transla-on Cloud

  6. HUGO .LV LATVIA TRANSLATES WITH • MT service for the Latvian public sector • Se Secur urely ly translates texts, documents, websites • Ad Adapted ed for the Latvian language and public sector texts • In Integr egrated ed into e-services and government websites • Languages • Latvian-English • English-Latvian • Latvian-Russian Developed by -lde.com/mt

  7. Translation website

  8. MT integration through Translation API

  9. Adjustable for terminology, named entities, translation memories

  10. Integrated into internal informa/on system of the Parliament

  11. Types and sources for data collec1on ◦ Public Corpora ◦ Crawling Web ◦ Texts from Publishers ◦ Data from public administra/on

  12. Size of the created corpora Corpus language (pair) Corpus Type Domain Corpus size (millions of sentences) English-Latvian Parallel General 5.8 Latvian-Russian Parallel General 5.1 English-Latvian Parallel State Adm. 3.3 Latvian-Russian Parallel State Adm. 2.0 English Monol. General 50 Latvian Monol. General 75 Russian Monol. General 75 English Monol. State Adm. 15 Latvian Monol. State Adm. 25 Russian Monol. State Adm. 24

  13. Evalua/on of MT systems, domain adapta/on System BLEU Language pair Domain HUGO.lv Google Translate English-Latvian General 34.85 31.05 English-Latvian State administration 55.58 26.05 Latvian-English General 44.11 42.92 Latvian-English State administration 60.93 28.00 Latvian-Russian General 40.66 14.41 Latvian-Russian State administration 65.88 19.72

  14. Machine Transla-on for the Presidency of European Council

  15. 2015 Presidency of the Council of the EU • 1 680 Presidency events • 25 300 par/cipants • 800+ journalists from 40 countries • 197 EU policy mee/ngs -lde.com/mt

  16. Adap-ng Hugo.lv for EU Presidency • An Analyzed ed specific EU Presidency terminology, added these terms to the system • Ad Adapted ed MT to use the right terms in the right inflec/onal forms – a challenge for SMT and morphologically rich languages -lde.com/mt

  17. Mobile MT kiosk for the EU Presidency headquarters at the Latvian Na/onal Library – transla/on as a u/lity -lde.com/mt

  18. “There will be no digital single market without mul/lingualism ul/lingualism, as few people feel comfortable opera/ng in their second or third language. ” Robert Madelin Chief Innova/on Officer, European Commission

  19. The plakorm is nominated for the World Summit Award and World Summit on the Informa/on Society Prize

  20. Lithuanian Language in the Informa/on Society Programme Lietuvių kalba informacinėje visuomenėje

  21. Lithuanian Language in the Informa-on Society Programme Lietuvių kalba informacinėje visuomenėje Th The p e purpos ose of e of t the p e prog ogram To develop solu/ons that would help to preserve Lithuanian language in all public state life areas and to provide free, easy and convenient access to these solu/ons by applying modern informa/on and communica/on technologies. Suppo Support rted d ac ac/vi/e /vi/es Lithuanian monolingual and speech corpora collec/on, educa/onal training tools development, open sonware localiza/on, monolingual and mul/lingual dic/onaries digitaliza/on, speech technologies applica/ons development, seman/c analysis tools and machine transla/on systems development. Number of projects – 6 Consolidated language resources and applica/ons web page – www.raš/ja.lt

  22. VERSTI.EU Public Lithuanian Machine Transla-on Development of sta-s-cal MT for Lithuanian-English-Lithuanian and Lithuanian-French-Lithuanian language pairs Project owner: Vilnius university Project dura-on: 01.04.2012 – 15.05.2015 Project site: www.vers-.eu

  23. VERSTI.EU Public Lithuanian Machine Transla-on Features: - Web sites transla/on - Document transla/on - Terminology extrac/on and integra/on in to MT - Browser transla/on add-on - Mobile applica/ons - Freely available web service API

  24. Mobile Apps • iOS, Android, Windows Phone • Technology: PhoneGap • Access cloud-based systems through API

  25. Corpora Total General domain Monolingual Lithuanian 856,7 M words English 1 928,9 M words French 1 163,3 M words Parallel English-Lithuanian 8,1 M sentences Data French-Lithuanian 7,0 M sentences Legal domain Collected Monolingual Lithuanian 207,8 M words English 582,3 M words French 586,3 M words Parallel English-Lithuanian 6.3 M sentences French-Lithuanian 5,7 M sentences IT domain Monolingual Lithuanian 254,1 M words English 262,1 M words Parallel VERSTI.EU English-Lithuanian 5,0 M sentences Public Lithuanian Machine Transla-on

  26. French-Lithuanian-French MT Systems Comparison of Lithuanian MT systems (BLEU) 37,57 37,19 17,95 18,03 17,45 14,33 VERSTI.EU GOOGLE TRANSLATE MICROSOFT TRANSLATOR French-Lithuanian Lithuanian-French

  27. ◦ Aims to achieve a level of language technology support for the Estonian language to enable the language to successfully operate and thrive in today's informa/on technology-based world. ◦ The programme funds language technology research and development, from the compila/on of resources to the crea/on of applica/on prototypes. ◦ Sub-objec/ves and expected results of the programme include machine transla/on. ◦ Funding in 2016: 765 342 € for 17 projects ◦ Tilde's MT project and parallel corpora collec/on project. hrps://www.keeletehnoloogia.ee/en?set_language=en

  28. Estonian Open Parallel Corpus Composed by Tilde Ees1 16 Millions Million 14 2,5 12 10 8 6 11 4 2 2 2 1,5 0 English-Estonian Russian-Estonian Latvian-Estonian French-Estonian 2012-2014 2015

  29. 30 24 25 21 19 20 17 15 10 5 0 Google Microson Bing University of Tilde LetsMT MT@EC Translate Tartu

  30. HOW MUCH TIME E IS SAVED ED IN PUBLIC INSTITUTIONS WITH THE E HEL ELP OF MT? Very much Very much 9% 9% Very littl ttle 27% 27% Greatl tly 21% 21% Somewhat t 15% 15% Moderate tely 28% 28% >400 public servants surveyed by Tilde MA student Lauri Valge supervised by Mar/n Luts.

  31. ◦ Although European Transla/on Cloud is s/ll a dream, significant steps are being made by both European Commission – CEF Automated Transla/on, and Member States – Bal/c examples ◦ Original vision of the Transla/on Cloud can be implemented only through close coopera/on between EC, Member States and private sector ◦ Coordinated approach is needed through a joint European Programme (part of the Mul/lingual Value Programme?)

  32. Thank you! andrejs@/lde.com /lde.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend