Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries
Andrejs Vasiļjevs Rihards Kalniņš /lde.com
META-FORUM 2016 | Lisabon | July 5, 2016
Towards European Transla/on Cloud: Development of public MT services - - PowerPoint PPT Presentation
Towards European Transla/on Cloud: Development of public MT services in the Bal/c countries META-FORUM 2016 | Lisabon | July 5, 2016 Andrejs Vasijevs Rihards Kalni /lde.com A T A Translation anslation Cl Cloud oud Sc Scenario
Andrejs Vasiļjevs Rihards Kalniņš /lde.com
META-FORUM 2016 | Lisabon | July 5, 2016
Priority Theme 1: T Priority Theme 1: Translation in the Cl anslation in the Cloud
MET META-NET Str A-NET Strat ategic R egic Resear esearch Agenda ch Agenda
Technology Council at META-FORUM 2012
q Europe-wide social and business networking in native language q Mobile and internet services in native language for e-Commerce,
education, travel, entertainment, etc.
q eGovernment reaching all linguistic groups and enabling political
discussion across borders
q Unlimited TV/movie cross-language subtitling/interpretation q Ever present Personal Interpreter q Translingual Spaces: dedicated locations for ambient interpretation
Downloadable at http://www.meta-net.eu/whitepapers/key-results-and-cross-language-comparison
European Transla-on Cloud
CEF Automated Transla-on Language Resources
Estonia MT Services & LR Latvia MT Services & LR Lithuania MT Services & LR Na-onal MT Services & LR
LATVIA TRANSLATES WITH
Developed by
Integrated into internal informa/on system of the Parliament
Types and sources for data collec1on
Size of the created corpora
Corpus language (pair) Corpus Type Domain Corpus size (millions of sentences)
English-Latvian Parallel General 5.8 Latvian-Russian Parallel General 5.1 English-Latvian Parallel State Adm. 3.3 Latvian-Russian Parallel State Adm. 2.0 English Monol. General 50 Latvian Monol. General 75 Russian Monol. General 75 English Monol. State Adm. 15 Latvian Monol. State Adm. 25 Russian Monol. State Adm. 24
Evalua/on of MT systems, domain adapta/on
System BLEU Language pair Domain HUGO.lv Google Translate English-Latvian General 34.85 31.05 English-Latvian State administration 55.58 26.05 Latvian-English General 44.11 42.92 Latvian-English State administration 60.93 28.00 Latvian-Russian General 40.66 14.41 Latvian-Russian State administration 65.88 19.72
Analyzed ed specific EU Presidency terminology, added these terms to the system
Adapted ed MT to use the right terms in the right inflec/onal forms – a challenge for SMT and morphologically rich languages
“There will be no digital single market without mul/lingualism ul/lingualism, as few people feel comfortable
Robert Madelin Chief Innova/on Officer, European Commission
The plakorm is nominated for the World Summit Award and World Summit on the Informa/on Society Prize
Lithuanian Language in the Informa/on Society Programme Lietuvių kalba informacinėje visuomenėje
Lithuanian Language in the Informa-on Society Programme Lietuvių kalba informacinėje visuomenėje
Th The p e purpos
e of t the p e prog
To develop solu/ons that would help to preserve Lithuanian language in all public state life areas and to provide free, easy and convenient access to these solu/ons by applying modern informa/on and communica/on technologies. Suppo Support rted d ac ac/vi/e /vi/es Lithuanian monolingual and speech corpora collec/on, educa/onal training tools development, open sonware localiza/on, monolingual and mul/lingual dic/onaries digitaliza/on, speech technologies applica/ons development, seman/c analysis tools and machine transla/on systems development. Number of projects – 6
Consolidated language resources and applica/ons web page – www.raš/ja.lt
Development of sta-s-cal MT for Lithuanian-English-Lithuanian and Lithuanian-French-Lithuanian language pairs Project owner: Vilnius university Project dura-on: 01.04.2012 – 15.05.2015 Project site: www.vers-.eu
Features:
VERSTI.EU
Public Lithuanian Machine Transla-on
through API
Corpora Total General domain Monolingual Lithuanian
856,7 M words
English
1 928,9 M words
French
1 163,3 M words
Parallel English-Lithuanian
8,1 M sentences
French-Lithuanian
7,0 M sentences
Legal domain Monolingual Lithuanian
207,8 M words
English
582,3 M words
French
586,3 M words
Parallel English-Lithuanian
6.3 M sentences
French-Lithuanian
5,7 M sentences
IT domain Monolingual Lithuanian
254,1 M words
English
262,1 M words
Parallel English-Lithuanian
5,0 M sentences
VERSTI.EU
Public Lithuanian Machine Transla-on
37,57 17,45 14,33 37,19 17,95 18,03 VERSTI.EU GOOGLE TRANSLATE MICROSOFT TRANSLATOR
Comparison of Lithuanian MT systems (BLEU)
French-Lithuanian Lithuanian-French
language to enable the language to successfully operate and thrive in today's informa/on technology-based world.
from the compila/on of resources to the crea/on of applica/on prototypes.
transla/on.
hrps://www.keeletehnoloogia.ee/en?set_language=en
11 2 1,5 2,5 2
2 4 6 8 10 12 14 16 English-Estonian Russian-Estonian Latvian-Estonian French-Estonian
Million Millions
Estonian Open Parallel Corpus Composed by Tilde Ees1
2012-2014 2015
21 17 19 24
5 10 15 20 25 30
Google Translate Microson Bing University of Tartu Tilde LetsMT MT@EC
Very littl ttle 27% 27% Somewhat t 15% 15% Moderate tely 28% 28% Greatl tly 21% 21% Very Very much much 9% 9%
HOW MUCH TIME E IS SAVED ED IN PUBLIC INSTITUTIONS WITH THE E HEL ELP OF MT? >400 public servants surveyed by Tilde MA student Lauri Valge supervised by Mar/n Luts.
significant steps are being made by both European Commission – CEF Automated Transla/on, and Member States – Bal/c examples
implemented only through close coopera/on between EC, Member States and private sector
European Programme (part of the Mul/lingual Value Programme?)